By Vamshi Kumar Bogoju
Now that we have laid the foundation of computer vision and explained a bit about basic semantic segmentation, this article aims to provide a very high-level introduction to Generative Adversarial Networks (GANs) with just a taste of mathematical buzzwords and deep theory. If you’ve read or heard of any articles related to fake faces, we have GANS to thank for that!
Here, we will:
– outline the two models that make up the GAN framework
– introduce the associated loss function
– highlight a few advantages and drawbacks
– list some notable applications.
In Machine Learning (more specifically, Deep Learning), Generative Adversarial Networks, or GANs, are neural networks that consist of two competing models, the Discriminative model and the Generative model. these two models have two key differences that allow them to work together to create the best solution.
- Learning Method: Generative models use Unsupervised Learning (without the use of label data) whereas discriminative models use Supervised Learning.
- Class Discrimination: Discriminative models draw a boundary between the classes of the data while Generative models use the data to produce new data samples.
At a high level, the generative model literally generates new data within the stated boundary conditions, and the discriminative model evaluates all data and distinguishes the real from generated data.
Now let’s take a deeper look into how GANs came about and how they work.
The first GANs paper was first published by Ian Goodfellow et al . in 2014, where he used GANs to generate new data from MNIST datasets.
As mentioned previously, GANs consist of 2 parts namely — Generator (G) and Discriminator (D). Both these combined are used to generate new data. GANs work as stated in the following example.
The relationship between the D and G models can be thought of as that of a counterfeiter of currency and a forensic currency expert.
G acts as a counterfeiter that produces fake currency as similar as possible to the original currency without being caught by anyone. On the other hand, D is a person who identifies the fake currency. They both compete until G prepares fake currency which cannot be identified by D.
Both these models are under constant two-player game, where one tries to outsmart the other, during the training.
G produces new data from noise, which, along with actual data of the domain, are passed to D, which classifies real and fake data. As the training goes on, G learns to create data similar to actual data, whereas D gets better at identifying real and fake data.
The Loss Function
The loss function is a critical concept to understanding the dynamic of how the “game” is played.
The variables:
Pz = Data distribution over noise z
Pg = Data distribution over generated data from generator
Pr = Data distribution over real data
During learning, the discriminator should classify real data samples correctly by maximizing
Ex∼pr(x) [log D(x)]. However, for the data generated by the generator i.e. G(z), z~Pz(z), the discriminator should expect the probability, D(G(z)), close to zero by maximizing Ez∼Pz(z) [log(1- D(G(z)))].
In parallel, Generator learns to make discriminator produce high probability for generated data sample by minimizing the Ez∼Pz(z) [log(1- D(G(z)))].
Both G and D plays a min-max game to optimize the loss function below:
While GANS is a super sophisticated approach, it is also a difficult task because it is slow and unstable. Additional pitfalls:
- Hard to achieve equilibrium (optimality)
- Vanishing Gradient — the discriminator is too good and the generator can never catch up.
- Mode Collapse — generators get trained on a sub-optimal and limited set of outputs by the discriminator.
Now that we’ve discussed the overarching concept behind GANS, which by the way are only one such example of Generative Modeling (that which produces new data from existing data), a few applications of GANs we’d like to share are:
- Deep Fakes
- Image to Image Translation
- Text to Image Translation
- Music Generation
Deep Fakes
GANs, due to their ability to produce almost real data, are used to produce fake images of people. Additionally, they are also used to remove deep fakes as effectively as producing them. A deep fake is when a person is replaced with someone else’s likeness.
Image to Image Translation
GANs are trained to translate the image from one domain to another domain. For example, translating a video of a horse to a zebra, where the zebra replaces the horse in the same video.
Text to Image Synthesis
GANs are used to convert natural language text descriptions to images using c
At DIG labs, GANs get us excited. We are applying the latest computer vision and machine learning models to analyze 💩 and provide health-related tips and recommendations. Check out our research areas here!