I have been exploring the latest techniques in Artificial Intelligence (AI) and Machine Learning (ML) to create abstract art. During my investigation, I learned that three things are needed to create abstract paintings: (A) source images, (B) an ML model, and (C) a lot of time to train the model on a high-end GPU. Before I discuss my work, let’s take a look at some prior research.

Background

Artificial Neural Networks

Warren McCulloch and Walter Pitts created a computational model for Neural Networks (NNs) back in 1943[1]. Their work led to research of both the biological processing in brains and the use of NNs for AI. Richard Nagyfi discusses the differences between Artificial Neural Networks (ANNs) and biological brains in this post. He describes an apt analogy that I will summarize here: ANNs are to brains as planes are to birds. Although the development of these technologies was inspired by biology, the actual implementations are very different!

Both ANNs and biological brains learn from external stimuli to understand things and predict outcomes. One of the key differences is that ANNs work with floating-point numbers and not just binary firing of neurons. With ANNs it’s numbers in and numbers out.

The diagram below shows the structure of a typical ANN. The inputs on the left are the numerical values that contain the incoming stimuli. The input layer is connected to one or more hidden layers that contain the memory of prior learning. The output layer, in this case just one number, is connected to each of the nodes in the hidden layer.

Each of the internal arrows represents numerical weights that are used as multipliers to modify the numbers in the layers as they get processed in the network from left to right. The system is trained with a dataset of input values and expected output values. The weights are initially set to random values. For the training process, the system runs through the training set multiple times, adjusting the weights to achieve the expected outputs. Eventually, the system will not only predict the outputs correctly from the training set, but it will also be able to predict outputs for unseen input values. This is the essence of Machine Learning (ML). The intelligence is in the weights. A more detailed discussion of the training process for ANNs can be found in Conor McDonald’s post, here.

Generative Adversarial Networks

In 2014, Ian Goodfellow and seven coauthors at the Université de Montréal presented a paper on Generative Adversarial Networks (GANs)[2]. They came up with a way to train two ANNs that effectively compete with each other to create content like photos, songs, prose, and yes, paintings. The first ANN is called the Generator and the second is called the Discriminator. The Generator is trying to create realistic output, in this case, a color painting. The Discriminator is trying to discern real paintings from the training set as opposed to fake paintings from the generator. Here’s what a GAN architecture looks like.

Generative Adversarial Network, Diagram by Author

A series of random noise is fed into the Generator, which then uses its trained weights to generate the resultant output, in this case, a color image. The Discriminator is trained by alternating between processing real paintings, with an expected output of 1, and fake paintings, with an expected output of -1. After each painting is sent to the Discriminator, it sends back detailed feedback about why the painting is not real, and the Generator adjusts its weights with this new knowledge to try and do better the next time. The two networks in the GAN are effectively trained together in an adversarial fashion. The Generator gets better at trying to pass off a fake image as real, and the Discriminator gets better at determining which input is real, and which is fake. Eventually, the Generator gets pretty good at generating realistic-looking images. You can read more about GANs, and the math they use, in Shweta Goyal’s post here.

Improved GANs for Large Images

Although the basic GAN described above works well with small images (i.e. 64x64 pixels), there are issues with larger images (i.e. 1024x1024 pixels). The basic GAN architecture has difficulty converging on good results for large images due to the unstructured nature of the pixels. It can’t see the forest from the trees. Researchers at NVIDIA developed a series of improved methods that allow for the training of GANs with larger images. The first is called “Progressive Growing of GANs” [3].

The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality. — Tero Karras et. al., NVIDIA

The team at NVIDIA continued their work on using GANs to generate large, realistic images, naming their architecture StyleGAN [4]. They started with their Progressive Growing of GANs as a base model and added a Style Mapping Network, which injects style information at various resolutions into the Generator Network.

The team further improved the image creation results with StyleGAN2, allowing the GAN to efficiently create high-quality images with fewer unwanted artifacts [5]. You can read more about these developments in Akria’s post, “From GAN basic to StyleGAN2”.

Prior Work to Create Art with GANs

Researchers have been looking to use GANs to create art since the GAN was introduced in 2014. A description of a system called ArtGAN was published in 2017 by Wei Ren Tan et. al. from Shinshu University, Nagano, Japan [6]. Their paper proposes to extend GANs…

… to synthetically generate more challenging and complex images such as artwork that have abstract characteristics. This is in contrast to most of the current solutions that focused on generating natural images such as room interiors, birds, flowers and faces. — Wei Ren Tan et. al., Shinshu University

A broader survey of using GANs to create art was conducted by Drew Flaherty for his Masters Thesis at the Queensland University of Technology in Brisbane, Australia [7]. He experimented with various GANs including basic GANs, CycleGAN [8], BigGAN [9], Pix2Pix, and StyleGAN. Of everything he tried, he liked StyleGAN the best.

The best visual result from the research came from StyleGAN. … Visual quality of the outputs were relatively high considering the model was only partially trained, with progressive improvements from earlier iterations showing more defined lines, textures and forms, sharper detail, and more developed compositions overall. — Drew Flaherty, Queensland University of Technology

For his experiments, Flaherty used a large library of artwork gleaned from various sources, including WikiArt.org, the Google Arts Project, Saatchi Art, and Tumblr blogs. He noted that not all of the source images are in the public domain, but he discusses the doctrine of fair use and its implications on ML and AI.

MachineRay

Overview

For my experiment, named MachineRay, I gathered images of abstract paintings from WikiArt.org, processed them, and fed them into StyleGAN2 at the size of 1024x1024. I trained the GAN for three weeks on a GPU using Google Colab. I then processed the output images by adjusting the aspect ratio and running them through another ANN for a super-resolution resize. The resultant images are 4096 pixels wide or tall, depending on the aspect ratio. Here’s a diagram of the components.

MachineRay Component Diagram, Diagram by Author

Gathering Source Images

To gather the source images, I wrote a Python script to scrape abstract paintings from WikiArt.org. Note that I filtered the images to only get paintings that were labeled in the “Abstract” genre, and only images that are labeled as being in the Public Domain. These include images that were published before 1925 or images that were created by artists who died before 1950. The top artists represented in the set are Wassily Kandinsky, Theo van Doesburg, Paul Klee, Kazimir Malevich, Janos Mattis-Teutsch, Giacomo Balla, and Piet Mondrian. A snippet of the Python code is below, and the full source file is here.

I gathered about 900 images, but I removed images that had representational components or ones that were too small, cutting the number down to 850. Here is a random sampling of the source images.

Random Sample of Abstract Paintings, Images from WikiArt.org in the Public Domain

Random Sample of Abstract Paintings from WikiArt.org in the Public Domain

Removing Frames

As you can see above, some of the paintings retain their wooden frames in the images, but some of them have the frames cropped out. For example, you can see the frame in Arthur Dove’s Storm Clouds. To make the source images consistent, and to allow the GAN to focus on the content of the paintings, I automatically removed the frames using a Python script. A snippet is below, and the full script is here.

The code opens each image and looks for square regions around the edges that have a different color from most of the painting. Once the edges are found, the image is cropped to omit the frame. Here are some pictures of source paintings before and after the frame removal.

Automatically Cropped Paintings, Images from WikiArt.org in the Public Domain

Image Augmentation

Although 850 images may seem like a lot, it’s not really enough to properly train a GAN. If there isn’t enough variety of images, the GAN may overfit the model which will yield poor results, or, worse yet, fall into the dreaded state of “model collapse”, which will yield nearly identical images.

StyleGAN2 has a built-in feature to randomly mirror the source images left-to-right. So this will effectively double the number of sample images to 1,700. This is better, but still not great. I used a technique called Image Augmentation to increase the number of images by a factor of 7, making it 11,900 images. Below is a code snippet for the Image Augmentation I used. The full source file is here.

The augmentation uses random rotation, scaling, cropping, and mild color correction to create more variety in the image samples. Note that I resize the images to 1024 by 1024 before applying the Image Augmentation. I will discuss the aspect ratio further down in this post. Here are some examples of Image Augmentation. The original is on the left, and there are six additional variations to the right.

Examples of Augmented Paintings, Source Images from WikiArt.org in the Public Domain

Training the GAN

I ran the training using Google Colab Pro. Using that service I could run for up to 24 hours on a high-end GPU, an NVIDIA Tesla P10 with 16 GB of memory. I also used Google Drive to retain the work in progress between runs. It took about 13 days to train the GAN, sending 5 million source images through the system. Here is a random sample of the results.

Sample Output from MachineRay, Image by Author

You can see from the sample of 28 images above that MachineRay produced paintings in a variety of styles, although there are some visual commonalities between them. There are hints to the styles in the source images, but no exact copies.

Adjusting the Aspect Ratio

Although the original source images had various aspect ratios, ranging from a thinner portrait shape to a wider landscape shape, I made them all dead square to help with the training of the GAN. In order to have a variety of aspect ratios for the output images, I imposed a new aspect ratio prior to the upscaling. Instead of just choosing a purely random aspect ratio, I created a function that chooses an aspect ratio that is based on the statistical distribution of aspect ratios in the source images. Here’s what the distribution looks like.

Aspect Ratio Distribution, Images from WikiArt.org in the Public Domain

The graph above plots the aspect ratio of all 850 source images. It ranges from about 0.5, which is a thin 1:2 ratio to about 2.0, which is a wide 2:1 ratio. The chart shows four of the source images to indicate where they are on the chart horizontally. Here’s my Python code that maps a random number from 0 to 850 into an aspect ratio based on the distribution of the source images.

I adjusted the MachineRay output from above to have varying aspect ratios in the pictures below. You can see that the images seem a bit more natural and less homogenous with just this small change.

Sample Output from MachineRay with Varying Aspect Ratios, Image by Author

Super Resolution Resizing

The images generated from MachineRay have a maximum height or width of 1024 pixels, which is OK for viewing on a computer, but not OK for printing. At 300 DPI it would only print at a size of about 3.5 inches. The images could be resized up, but it would look very soft if printed at 12 inches. There is a technique that uses ANNs to resize images that maintain crisp features called Image Super-Resolution (ISR). For more information on Super-Resolution check out Bharath Raj’s post here.

There is a nice open-source ISR system with pre-trained models available from Idealo, a German company. Their GANs model does a 4x resize using a GAN trained on photographs. I found that adding a little bit of random noise to the image prior to the ISR creates a painterly effect. Here is the Python code I used to post-process the images.

You can see the results of adding noise and Image Super-Resolution resizing here. Note that the texture detail looks a bit like brushstrokes.

Sample Image After Added Noise and ISR, Image by Author

Close-up to Show Detail, Image by Author

Check out the gallery in Appendix A to see high-resolution output samples from MachineRay.

Next Steps

Additional work might include running the GAN at sizes greater than 1024x1024. Porting the code to run on Tensor Processing Units (TPUs) instead of GPUs would make the training run faster. Also, the ISR GAN from Idealo could be trained using paintings instead of photos. This may add a more realistic painterly effect to the images.

Acknowledgments

I would like to thank Jennifer Lim and Oliver Strimpel for their help and feedback on this project.

Source Code

All source code for this project is available on GitHub. A Google Colab for generating images is available here. The sources are released under the CC BY-NC-SA license.

References

[1] W. McCulloch, W. Pitts, “A Logical Calculus of Ideas Immanent in Nervous Activity”, Bulletin of Mathematical Biophysics. 5 (4): 115–133, Dec. 1943

[2] Ian, Goodfellow. “Generative Adversarial Networks.”, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, v1, Jun. 2014

[3] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” CoRR, vol. abs/1710.1, Oct. 2017

[4] T. Karras, S. Laine, T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks”, CVPR2019, Mar. 2019

[5] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN”, Mar. 2020

[6] W. R. Tan, C. S. Chan, H. E. Aguirre, and K. Tanaka, “ArtGAN: Artwork Synthesis with Conditional Categorical GANs”, April 2017

[7] D. Flaherty, “Artistic Approaches to Machine Learning”, Queensland University of Technology, Masters Thesis, 2020

[8] J. Zhu, T. Park, P. Isola, and A. A. Efro, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, Nov. 2018

[9] A. Brock, J. Donahue, and K. Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis”, Feb. 2019

Appendix A — Gallery of MachineRay Results