Is Fine Art the Next Frontier of AI?
From museum masterpieces to Bach-like compositions, these computer-generated creations put the ‘art’
“Can machines think?”
“Are there imaginable digital computers which would do well in the imitation game?”
In most applications of AI, a model is created to imitate the judgment of humans and implement it at scale, be it autonomous vehicles, text summarization, image recognition, or product recommendation.
By the nature of imitation, a computer is only able to replicate what humans have done, based on previous data. This doesn’t leave room for genuine creativity, which relies on innovation, not imitation.
But more recently, computer-generated creations have started to push the boundaries between imitation and innovation across various mediums.
The question arises: can a computer be creative? Can it be taught to innovate on its own and generate original outputs? And can it do this in a way that makes it indistinguishable from human creativity?
Here, a few developments at the intersection of art and AI that can help us to answer those questions.
1. Edmond de Belamy
In October 2018, Christie’s Auction House in New York sold a computer-generated portrait of Edmond de Belamy, created in the style of 19th-century European portraiture.
The piece sold for $432,500, more than 40 times its original estimate.
The painting (or, as art aficionados may prefer, print) is part of a collection of portraits of the fictional Belamy family, created by the French collective Obvious, which aims to explore the interface of AI with art.
As well as the seemingly unfinished, blurry and featureless portrait of Edmond Belamy himself, almost as eye-catching is the mathematical formula, in place of a signature, in the bottom right corner.
This formula is the loss function used by the Generative Adversarial Network (GAN) to create the portrait. This raises interesting questions about the authorship of such pieces of art. Are they truly the result of the mathematical formula, or the human who originally developed it?
GANs are a deep learning framework containing two competing (hence the name “adversarial”) neural networks, with the aim of creating new datasets that statistically mimic the original training data.
The first, known as the discriminator, is fed a training set of data (in this case images) and aims to learn to discriminate this data from synthetically generated data. To create the Belamy family, Obvious trained the discriminator on 15,000 portraits produced between the 14th and 20th centuries.
The second, the generator, creates an output, trying to fool the discriminator into incorrectly identifying it as part of the original data. As such, the final output is newly created data, similar enough to the original that the discriminator cannot tell it has been synthetically created.
Edmond de Belamy may be proof of at least one thing: that people are willing to pay for fine art developed by AI.
But the question remains whether Obvious successfully imitated human creativity. Considering the purpose of a GAN is to replicate its training data, it might be a stretch to argue that their outputs are truly innovative.
On 13th February 2019 a four-week exhibit, Faceless Portraits Transcending Time, at the HG Contemporary art gallery in Chelsea, New York, contained prints of artwork produced entirely by AICAN, an algorithm designed and written by Ahmed Elgammal, Director of the Art & AI Lab at Rutgers University. According to Elgammal,
AICAN [is] a program that could be thought of as a nearly autonomous artist that has learned existing styles and aesthetics and can generate innovate images of its own.
Faceless Portraits Transcending Time Exhibition, 2019. Image provided by Ahmed Elgammal
Instead of GANs, AICAN uses what Elgammal has called a “creative adversarial network” (CAN). These diverge from GANs by adding an element that penalizes the model for work that too closely matches a given established style.
Psychologist Colin Martindale hypothesizes that artists will try to increase the appeal of their work by diverging from existing artistic styles. CANs do just that: allowing a model to introduce novelty so that AICAN can diverge from existing styles.
AICAN is trained on over 80,000 images of Western art over the last 5 centuries but does not focus on a specific artistic style. As well as the images themselves, the algorithm is also fed the names of the pieces, so that the output is an image along with a title, all created by AICAN.
More often than not, these pieces are more abstract, which Elgammal believes is because AICAN uses the most recent trends in art history, such as abstract art, to understand how best to diverge from existing styles.
Selection of images created by AICAN. Image provided by Ahmed Elgammal
In the paper introducing CANs, two experiments were conducted on humans to ascertain whether or not they could distinguish between human and computer-generated images.
Each experiment, which received 10 distinct responses, measured that humans incorrectly labeled the CAN images as produced by humans 53% and 75% of the time, respectively. This is compared to 35% and 65% for GANs.
CANs may be more successful than GANs at imitating humans. Perhaps we can finally argue that CANs succeed where GANs failed. They don’t just try to replicate a dataset—the penalty term might actually allow them to innovate.
3. Musical intelligence
In 1981, David Cope, a music professor at the University of California, began what he called “Experiments in Musical Intelligence” (EMI, pronounced “Emmy”).
According to Cope, he began these experiments as the result of composer’s block; he wanted a program that understood his overall style of music and could provide him with the next note or measure. However, he found that he had very little information about his own style and instead,
I began creating computer programs which composed complete works in the styles of various classical composers, about which I felt I knew something more concrete.
So, Cope began writing EMI in Lisp, a functional programming language created in the mid-1900s. He developed it on three key principles:
- Deconstruction — analyzing the music and separating it into parts
- Signatures — identifying commonalities for a given composer and retaining the parts that signify their style
- Compatibility — recombining the pieces into a new piece
After seven years of work Cope finally finished a version of EMI to imitate the style of Johann Sebastian Bach and, in a single day, it was able to compose 5,000 works in Bach’s style. Of these, Cope selected a few which were performed in Santa Cruz without informing the audience that they were not authentic works of Bach.
After praising the wonderful performance, the audience was told that these were created by a computer, and a significant proportion of the audience, and the wider music community, reacted with anger.
In particular, Professor Steve Larson from the University of Oregon proposed to Cope a challenge. In October 1997 Larson’s wife, the pianist Winifred Kerner performed three pieces of music in front of hundreds of students in the University of Oregon’s concert hall. One was composed by Bach, one by Larson and one by EMI.
At the end of the concert, the audience was asked to guess which piece was by which composer. To Larson’s dismay, the audience thought EMI’s piece was composed by Bach, Bach’s piece by Larson and Larson’s piece by EMI.
This is possibly one of the most successful stories of a computer imitating human creativity. (Have a listen to some of the pieces and you will be hardpressed to notice any difference between EMI and a human composer.) However, what makes EMI great at imitation is also what makes it bad at innovation. Just like GANs, they are imitating to the detriment of innovation.
In 2016, artist and designer Es Devlin met with Hans-Ulrich Obrist, Artistic Director of the Serpentine Galleries in London, to discuss what original and creative ideas they could come up with for the Serpentine Gala in 2017. Devlin decided to collaborate with Google Art & Culture Lab and Ross Goodwin to create POEMPORTRAITS.
POEMPORTRAITS asks users to donate a word, then uses the word to write a poem. This poem is then overlaid onto a selfie taken by the user.
According to Devlin,
“the resulting poems can be surprisingly poignant, and at other times nonsensical.”
These poems are then added to an ever-growing collective poem, containing all POEMPORTRAITS’ generated poems.
My poem portrait created after donating the word ‘fluorescent’.
I tried it myself, donating the word ‘fluorescent’. You can see my POEMPORTRAIT above.
Before he collaborated with Google and Devlin, Goodwin had been experimenting with text generation. His code is available on GitHub and includes two pre-trained LSTM (Long Short-Term Memory) models for poem generation, which were used as a base for POEMPORTRAIT.
An LSTM is a type of recurrent neural network (RNN) that determines which word connections should be persisted further into a text to ensure the model understands the association between words.
For example, in the sentence “The car was great, so I decided to buy it,” the model will learn that the word ‘it’ refers to the word ‘car. This is a step beyond earlier models which only considered relations between words within a given distance of each other.
The ongoing collective poem from POEMPORTRAITS, a concatenation of all the poems created using users’ word donations
For POEMPORTRAIT, the LSTM model was trained on over 25 million words, written by 19th-century poets, to build a statistical model that essentially predicts the next word given a word or set of words. Hence, the donated word acts as a seed to which words are added, producing prose in the style of 19th-century poetry.
Unfortunately, there have not been any experiments on humans to qualitatively measure the effectiveness of POEMPORTRAITS at imitating human poets.
It is clear that these are not just a random string of words, but follow (at least loosely) a set of language rules learned by the LSTM models. However, one can argue that poetry (and the same argument can be made for painting and music) is the culmination of human emotion.
5. Interactive graphics
A group of researchers from NVIDIA released a paper in 2018 detailing Video-to-Video Synthesis, a process whereby a model generates a new video based on a training video or set of training videos.
As well as making their work publicly available on their GitHub repo, an physical, interactive prototype was showcased at the NeurIPS conference in Montreal, Canada. This prototype was a simple driving simulator, in a world where the graphics had been designed entirely by a machine learning model.
To build this prototype they first took training data from an open-source dataset created for the training of autonomous vehicles. This dataset was then segmented into different objects (trees, cars, road, etc.) and a GAN was trained on these segments so that it could generate its own versions of these objects.
Using a standard game engine, Unreal Engine 4, they created a framework for their graphical world. Then, the GAN generated objects for each category of item in real-time as needed.
GANs are used to produce object images on top of a 3D framework. Image from GitHub repo
In some sense, this may seem similar to any other computer-generated image created by a GAN (or CAN). We saw two examples of these earlier in this article.
However, the researchers realized that regenerating the entire world for each frame led to inconsistencies. Although a tree would appear in the same position in each frame, the image of the tree itself would change as it was being regenerated by the model.
To solve this, the researchers added a short term memory to the model, ensuring that the objects remained somewhat consistent between frames.
Unlike all our previous example, video games may have a slightly different goal. The models don’t have to innovate in the same way an artist innovates when they create a new piece, and, generally speaking, there doesn’t need to be any emotion behind the output.
Instead, gamers will want models to depict a realistic-looking world for them to play in. However, in this case, the model was extremely computationally expensive and the demo only ran at 25 frames per second. As well as this, despite being in 2K the images display the characteristic blurriness of GAN generated images.
Unfortunately, according to Bryan Catanzaro, NVIDIA’s Vice Chairman of Applied Deep Learning, it will likely be decades before AI-produced graphics are used in consumer games.
AI is starting to contribute to all areas of the art world, as we can see from the examples above. However, the question remains as to whether these innovations are truly—well, innovative.
Are these models effective imitators?
We saw in several cases, including AICAN and EMI that computers can generate outputs that fool humans. However, especially for painting, this may be limited to particular styles.
The outputs of generative models (GANs and CANs) generally do not create solid and well-defined lines, meaning images are often blurry. This can be effective for certain styles (say, abstract art) but not for others (say, portraiture).
Are these models innovating?
Innovation is a key characteristic of humans, but it is often hard to define. We clearly saw how CANs tried to add innovation by adapting GANs to penalize unoriginality, but one can still argue that the output is a culmination of whatever training data the model was fed.
On the other hand, are humans ideas not the culmination of human past experiences, our own training data so to speak?
Finally, does art require human emotion?
One thing is for certain, none of the pieces in the examples above were generated with any emotional intelligence. In mediums such as poetry and art, the story and emotion behind a piece, instilled by the author, is often what makes it resonate with others.
Without this emotional intelligence by the author, can a piece of art be truly appreciated by its audience?
Perhaps the real question is, does it matter?
In a world as subjective as the art world perhaps computers don’t have to definitively imitate or innovate but can find their own unique place alongside humans.
Data Science Consultant, NLP enthusiast, Physics graduate https://medium.com/@jonnyndavis