Creating New Humans With Generative Adversarial Networks and Deep Learning

Like all machine learning models, a high quality and plentiful dataset is required to achieve good results.

We load images into 3 channels for RGB, then resize them to 128×128. I found this resolution a good compromise between low computing times and high resolution. When images are loaded in, each pixel’s value is a float from 0 to 255. Since our generator’s activation function will be tanh, the pixel’s values must be mapped to values between -1 to 1 to match.

The dataset used was a combination of images found using this image downloading script and another eye dataset. This resulted in around 1000 images of close-up eyes. This may have been enough, but I decided to augment the data by horizontally flipping each image to create a total of 2000 images.

After loading and augmenting the images, we create a TensorFlow dataset on line 23, shuffle the images and separate them into batches. I trained the model with a batch size of 64, as that was the maximum my GPU could handle. The final images in the dataset look like this.

The discriminator

The discriminator is just a simple CNN. A sigmoid activation function is added to keep the output values between 0 and 1. The CNN outputting 1 would mean that it is 100% confident that the input image is a real eye from the dataset.

The generator

The generator is a little more complicated

It acts almost like a reverse CNN. It takes in a tensor of random noise and applies filters to upscale it to a 128×128 image. As it trains, the weights of the filters will improve to create better images.

Thus the generator’s input is a random tensor of size 100.

The tensor of shape 100, is connected to a dense layer of size 16*16*256. This is so that the layer can be resized to 16, 16, 256. In other words, the random noise is converted to a 16×16 image with 256 channels. The Conv2DTranspose layers reduce the channels while increasing the size of the output until finally, the generator outputs a 128×128 image with 3 channels.

Loss

The loss function is a crucial part of the GAN and any neural network. For GANs, we use two separate loss functions for the discriminator and the generator, to optimize both.

Generator Loss

The generator is being optimized to create an output that the discriminator will classify as a 1 for an image that resembles out of the training data. To optimize for that, we use binary cross-entropy loss, with y_true as one, and y_pred as the output of the discriminator when given images from the generator at the current training step. This captures how close the generated images are to a real image.

Discriminator Loss

Unlike the generator, the discriminator is being optimized on two things each step. It’s accuracy at matching the real training data to a label of 1 and the generated outputs to a label of 0. Therefore, the discriminator loss is made up of two separate losses summed up. Like the generator loss, binary cross-entropy is used.

The main training loop that runs the model’s training. In each training step, noise is generated for the generator. Then, the generator’s output images are fed into the discriminator. Likewise, a batch of real eyes is fed into the discriminator. The discriminator’s two predictions are put into the loss function, then the gradient is calculated. Finally, the Adam optimizer applies the gradients to the two models. After, each epoch, an image is generated with examples of what the generator produces.

400 epochs

Running the GAN for 400 epochs and gets the results below. The generator has learned the skin colour from the dataset and has a general idea of what eye sockets and pupils look like. However, the generated images are still blurry and have many artifacts