Machine-machine vision—a parallel visual culture developed by machines, for machines—is testing the boundaries of what we consider thought.
In his 1987 essay “Can Thought Go On Without A Body?” the philosopher Jean-François Lyotard saw thought as inseparable from the body; the body is the hardware in which the software—thought—exists. Hardware, however, can become faulty, whilst software can be updated and transferred to a different host. I will examine here if the field of machine-machine vision represented by Generative Adversarial Networks (GANs) offers a challenge to Lyotard’s claim that thought and body are inseparable, and by doing so will discuss the limits of this type of artificial intelligence.
The theorist and town planner Paul Virilio has characterized our era as one which is marked by the “acceleration of reality.” According to Virilio, this acceleration has automated not just production, but decision making powers. The question of whether thought can be separated from the body is not a new one, but in the light of Virilio’s diagnosis of the present conditions of automation it has become a pressing one.
The history of technological development conspires towards the displacement of the human body. In The Vision Machine (1994), Virilio sketches a history of sight that has aimed to render the eye inert: since the creation of artificial lenses, “the eye’s motility is transformed into fixity.” He concludes that “society is sinking into the darkness of a voluntary blindness.” This is not physical blindness, but rather the inability to perceive and to think. The field of vision has been subject to the “acceleration of reality” and leads Virilio in The Administration of Fear (2012) to claim that “we have reached the limits of instantaneity, the limits of human thought.” Note that this is not the end of thought, but the end of human thought. The implication is that thought carries on and can be extricated from the human body.
What does the successful separation of thought from our bodies look like? To answer this question, I will look at the possibilities of machine-machine vision, specifically examining how images are created by GANs. This technology can be seen to operate within the field of machine-machine vision, which is an area of technology, described by Virilio, where “images are created by the machine for the machine.”
This type of image creation is a burgeoning field of study, particularly as machine-machine vision technology proliferates, especially in the form of surveillance cameras and drones. Through investigating GANs technology and machine-machine vision generally, I aim to draw out their effects on human vision and to test if this would provide evidence to the claim that “the limits of human thought” have been reached and if it could, therefore, be separated from the body.
The academic Jill Walker Rettberg defines machine-machine vision as a process of “registration, analysis, and representation of visual information by machines.” She goes on to state that “pre-digital technologies such as cameras gave us a taste of how machines perceive the world, but today’s algorithms go even further in pushing us to see machines as perceiving beings with agency.” Similarly, the artist Trevor Paglen states in Invisible Images (Your Pictures Are Looking at You) that the images these technologies create and circulate have made a “landscape of invisible images” where “humans are rarely in the loop.”
Paglen claims that digital images are machine readable, whereas undeveloped film is not; the difference being that the digital image does not need “to be turned into human-readable form in order for a machine to do something with it” and therefore can circulate indiscriminately, allowing for the “automation of vision on an enormous scale.” Paglen says that this technology is a process of “activations and operations” rather than being representational and, like Rettberg, gives machine-machine vision agency.
Do GANs represent a perceiving being? This type of machine learning can be categorized within the field of artificial intelligence and has been discussed widely since the release of “Generative Adversarial Nets” (2014) by Ian Goodfellow et al. The technology is described as semi-supervised machine learning and is designed to generate new data that is indiscernible and essentially different from the training data. GANs are two neural networks known as the Descriptor (D) and Generator (G) that are trained on a dataset and then put in a game against one another, where G is attempting to trick D into mistaking the generated data for training data.
The data that is given to the GANs can be, Goodfellow states, that which represents “images, audio waveforms containing speech, and symbols in natural language corpora.” The form that the data takes in Goodfellow’s paper is that of an image, and many others have followed this example. In Goodfellow’s 2014 paper, the GANs are trained on image datasets. The use of images to represent data is used in Goodfellow’s paper to provide a simple and quick illustration of how well the GANs have generated new data. The evolution of the process in generating new data is given in a table of images (see Fig. 1) that lead to the final desired outcome of a new image that is almost identical from the datasets that were given. The images prior to the final column show the evolution of how the technology was trained.
These images could all be classed as instances of machine-machine vision, images “created by the machine for the machine.” The final column shows the outcome of the process and could be an example for Rettberg where “machine vision becomes primary, with humans only being notified when the machine identifies something it has perceived as being valuable or dangerous.” It is here that GANs can be seen as part of the acceleration of reality with the process of decision-making being deferred to autonomy of the machine.
The acceleration of reality is also created with the circulation of images that are both seen and unseen by humans. Paglen states that it is these unseen invisible images that increasingly dictate our surroundings, rendering visual culture opaque and unaccountable. Images that could be classed as machine-machine vision make up, alongside other visual technologies, the “landscape of invisible images” and are found in industrial operations, “law enforcement, and ‘smart’ cities” and:
“extend far into what we’d otherwise, and somewhat naively, think of as human-to-human visual culture. I’m referring here to the trillions of images that humans share on digital platforms, ones that at first glance seem to be made by humans for other humans.”
Invisible images for Paglen predominate in digital culture and can all be weaponized depending on where they are circulated. In regard to Facebook, Paglen states that when a photo is uploaded the user is “feeding an array of immensely powerful artificial intelligence systems information about how to identify people.” There have been calls to keep informed about this new mode of visuality and update our methods of seeing.
A website dedicated to this task is thispersondoesnotexist.com. The website shows a face of a person that we are told has never existed and was made by a type of GAN (see Fig. 2). When the user refreshes the page, they are given another completely different face, again of someone that has never existed. The website is intended to be a pedagogical tool. The designer of the website states that “this site helps to instantly educate the public on an important subject in artificial intelligence called GANs.”
If you look close enough, these portraits bear traces of being made by GANs through the remnants of the unseen side of machine-machine vision. These traces are typically found on the periphery of the image. In Fig. 2, at first glance a woman’s face easily passes as human—but on closer inspection it appears to be created by GANs. Looking at the shoulders, mainly at the top of them, each has incongruous shapes that break up the form of a typical body. The dichroic colors of the hairband of the woman melts away, extending into the hair; the bottom left earlobe is iridescent and shows the head merging into the background. While the background of the portrait could conceivably be out of focus, the ruptures around the edge of the figure give the impression that the background is bleeding into the face, via the hair, left ear, and shoulders. Here the unseen remnants of machine-machine vision persist and act as noise over the rest of the data.
This image is a product of machine-machine vision and is also part of the invisible landscape of images; with its unseen remnants of the technology, it creates a seemingly ahistorical aesthetic chiming with Paglen’s claim that “increasingly autonomous ways of seeing bear little resemblance to the visual culture of the past.” The emphasis here is shifting towards the need to clear the ground, to create a new visual language to receive these images. Yet is there not plausible recourse to historic and current notions of visuality, rather than a schismatic break from it.
Going back to the images created on thispersondoesnotexist.com, some of the images mirror art historical tropes. Looking at the face in Fig. 3, for example, the portrait plays on the trope of the mise en abyme, which translates literally as “to put into abyss.” The inability of the GANs to render something that adequately resembles anthropomorphic space is shown in the reflection of the sunglasses in Fig. 3. The reflection presents a purple-green landscape that seems to suggest originary beginnings and dystopic futures.
The reflection in the glasses shows the remnants of the unseen side of the GANs. The effect describes a situating within something else, and in the canon of Western art is a self-reflexive acknowledgement of the wider concerns within a work. The effect is aimed at alluding to the central antagonism of a work and, to simplify, tends to highlight a struggle for power or a location of authority.
For example, in Van Eyck’s The Arnolfini Portrait (1434), which depicts a newly married couple holding hands standing on either side of a convex mirror, the reflection shows that the scene was being witnessed by others. It was in the reflection of the mirror that art historian Erwin Panofsky recognized evidence of a form of marriage contract. Velázquez’s Las Meninas (1656) depicts a scene from Philip IV’s court. The mirror at the center of the painting provides the reflection of Phillip IV and his wife, and their presence in the painting, as art historian Svetlana Alpers says, “is an oblique affair” that demonstrates that “order is produced by acts of representation.” In both instances, the reflection demonstrates the subtle manifestations of hegemony: The Arnolfini Portrait shows the strength of the emergent bourgeois nuclear family, and Las Meninas the looming project of empire.
What is reflected here in the glasses, it would be tempting to say, is the growing omnipotence of artificial intelligence and its ability to become alpha and omega through creating a world that is hostile and devoid of humanity. However, what does the alien landscape reflected represent on a more practical level? It is an error. That is, it reveals that the image is a faked human face and provides a doorway into discerning the provenance of the image.
To go further, this reflection is in fact an inside look into the game between the D and G, where G is endlessly attempting to trick D into mistaking new data as training data; the image inside the glasses is then the unseen remnant of the GANs, the attempt at producing new data. This type of image is outside the classification of training data or new data, and would come under the category of noise or distortion. The New Penguin Dictionary of Computing states that noise is defined as a random signal that disturbs “any wanted signal” and can be “caused by static, temperature, power supply, electric fields, the stars, or the sun.” Noise more broadly occurs when unwanted information interferes with that which is being presented, and distortion refers to an undesired change or loss of information.
GANs images are representations of data and hold within them instances of noise, like that which are reflected in the glasses. While these images are representations of data, they are also just that: images. As art historian John Berger says, an image can be defined as:
“a sight which has been recreated or reproduced, it is an appearance, or a set of appearances, which has been detached from the place and time in which it first made its appearance and preserved.”
Crucially, he then says:
“Every image embodies a way of seeing. Even a photograph. For photographs are not, as is often assumed, a mechanical record. Every time we look at a photograph, we are aware, however slightly, of the photographer selecting that sight from an infinity of other possible sights.”
The importance being that which is selected. What has been selected in the GANs images comes through the structure of the website which shows the viewer what is “valuable.” The selection denotes for Berger a way of seeing that carries with it forms of knowledge and thought. Through selecting the data to become an image, the website reveals the unseen process of machine-machine vision as a process of sharing data, and reveals it also as a process of circulating noise and distortion. The preclusion of the term “data” from the discussion of images created and circulated by machine-machine vision technologies allows agency and intentionality to be attributed to the process that they are programmed to do. Obscuring data from the discussion of machine-machine vision allows it to be seen solely as a process of automating vision. However, this is also the process of automating data—the majority of which is noise that has to be selected to become something else. The importance is in the process of selection rather than circulation, which is down to the organization, group, or person that selects.
This is not to say that machine vision technologies are not worrying, as Paglen brings up useful examples of the power of machine vision to negatively impact people’s lives. However, it is not their power which is disturbing, but our sacrifice to it. Lewis Mumford, writing in 1932, says that this sacrifice is not contingent, but instead historically continuous:
“By renouncing a large part of his humanity, man could achieve godhood; he dawned on this second chaos and created the machine in his own image; the image of power, but power ripped loose from his flesh and isolated from his humanity.”
To create technological power, we sacrificed the belief in our own power, both individual and collective; this is the power not just to hold those in power to account, but to intervene within legislative and democratic structures. Paglen’s argument is that this type of intervention is beyond us, as big business and the state have control of technology and there is:
“no technical ‘fix’ for the exacerbation of the political and economic inequalities that invisible visual culture is primed to encourage. To mediate against the optimizations and predations of a machinic landscape, one must create deliberate inefficiencies and spheres of life removed from market and political predations–‘safe houses’ in the invisible digital sphere.”
The potential nefarious uses of machine-machine vision and their ability to move from the representational to the material, in that they can “intervene in everyday life,” means that the danger should be confronted. In the environment of drones, facial recognition software, and license plate readers, this intervention seems positively dystopic.
Paglen states that this intervention from machine-machine vision technologies “serves powerful government and corporate interests at the expense of vulnerable populations and civic life.” Here, he attaches these images to a material outcome which, surprisingly or not, has been reproduced historically ad infinitum. What is interesting is that he claims that human production has been separated from the body: “visual culture...has become detached from human eyes and has largely become invisible.”
In The Administration of Fear, the interviewer asks: “[Is] the loss of place joined by the loss of the body?” Virilio answers in the affirmative and says that “people are required to transfer their power of decision to automatic responses.” Virilio’s belief that the body has been displaced by technology enables the possibility of separating thought from the body and allows thought to be shifted on to technology. This displacement conjures the fantasy of the outside, in the form of digital “safe houses” or otherwise, negating the body politic and positing an imaginary purified network away from the human beneficiaries of hegemonic forces.
It would seem initially that GANs have separated thought from the body. Their power of image creation outpaces anything a human could do. Combined with their ability to completely avoid human detection as a forgery, and their possibility for augmentation, GANs can be very powerful.
To avoid over-deterministic readings of machine-machine vision’s power, it is helpful to ground the images they create, and the data they circulate, within wider political and economic considerations. In the examples looked at, the images, when seen in their originary form as representations of data, have qualities of noise and distortion and reflect and feed into wider considerations of misinformation and dissimulation.
Certain formulations of machine-machine vision do not present a radical loss of human autonomy, instead providing some of the noise in which human responsibility and collective organization are hidden. Ultimately the belief that the “limits of human thought” have been reached gives over human criticality to those with technological means, in all the various guises that takes.
This should not then become a search for de-acceleration and reducing technological interaction. It should be an awareness of the political and economic conditions in which advanced technology, like machine-machine vision, exists, and how it is interacted with. Certain strands of discussion on technology present it as radically outside humanity, as a question to the limits of thought, not because it is or can be separated from human form, but because its human structures are either hard to discern or seen as too difficult to overcome.
Cover image: Mario Klingemann, "Interstitial Space," 2019. Courtesy of the artist / Sotheby’s
Thanks to Jude Wells for introducing the topic of machine learning to me, explaining some of its aspects, and for the conversations we had around it.