Facebook’s AI research director Yann LeCun called GAN “the most interesting idea in the last 10 years in ML. Generative Adversarial Networks are a powerful class of neural networks used for unsupervised learning.
This article is first of a series of 3 articles, in which we will step-by-step develop an intuitive (part 1), theoretical (part 2) and practical (part 3) understanding of the most groundbreaking invention in the field of computer vision and deep learning in the past 10 years – Generative Adversarial Networks.
In this first article, we will focus on getting comfortable with the concept of GANs, assuming that the reader has some familiarity with the basic concepts of deep learning and related concepts like optimization, gradients and backpropagation.
Generative Adversarial Networks is a promising idea, introduced by Ian Goodfellow et. al. in 2014, that uses Adversarial training to learn a generative model, which can generate completely unseen images.
GAN comprises of a set of 2 neural networks: Generator & Discriminator.
A Generator takes a random noise signal as input and generates an image from it. Input: The noise signal is a random sample in the form of a vector, taken usually from a uniform or normal distribution. This is also called a latent vector. Function: The generator is required to learn the probability distribution of the real image data and use that information to transform the noise signal into a real-like image. As this image is created from a noise signal and not a real image, it is actually a fake image. Output: A fake image.
A Discriminator is a regular neural network classifier whose aim is to discriminate between a set of real and fake images that are fed as inputs. Input: A set of real images and another set of similar-sized fake images generated by the generator net. Function: The discriminator is required to learn features that distinguish a real image from a fake image, with the output being a number close to 1 for real image and a number close to 0 for the fake image. This number is the probability with which the discriminator predicts the image as real or fake. Output: A probability value.
If one thinks a little deeper about this structure, it is clear that the discriminator net acts as an adversary to the generative net and hence the name ‘Generative – Adversarial Network’.
We can say, the discriminator is performing well if it correctly classifies the real and fake images, and the generator is performing well if it can generate real-looking images so that the discriminator cannot detect it as fake.
To understand better, we will discuss this in detail using an interesting example from the original paper.
“The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.”
example here compares the generative model and adversarial/discriminative
neural net to a team of counterfeiters and police respectively.
The counterfeiters are trying to generate fake currency.
The police are trying to correctly distinguish between fake and real currency.
Click here to be part of INSOFE’s exciting research through our doctoral program for working professionals – World’s first Doctorate in Business Administration (DBA) in Data Science
The counterfeiters generate fake currency but they will not do a great job initially.
The police will not be able to detect the fake currency immediately but will soon learn some patterns to differentiate the fake from the real.
The counterfeiters will try to learn from their mistakes. They will try to understand those patterns which the police are using to differentiate the fake currency notes from real and improve upon them. They will now do a better job of generating real-like fake currency.
This time the police will struggle a bit, but will soon learn new patterns to differentiate the fake from the real.
The counterfeiters will again try to learn these new patterns due to which their generated currency is being caught by the police, and then fix those patterns to match real currency.
This process of counterfeiters and police both learning and improving separately will go on until the police can no longer detect the differences between the real and fake images. This means that after a series of learning steps, counterfeiters will become skilled at creating real looking currency.
If you have understood this, then congratulations because this is exactly how a GAN works. The discriminator and generator are trained separately a few times until they reach an equilibrium state where the generator has learned the distribution of real input data and the discriminator cannot differentiate between real and fake images.
Now that the intuition is clear, let’s try to think about how a neural network – GAN is creating this magic. We will use the same notations as given in the original GAN paper.
The noise input z to the generator is a random sample taken from a distribution p_z (z), which is usually a normal distribution with some mean and variance.
The generator is a multilayer perceptron, with a differentiable function G and trainable parameters θ_g, that learns a distribution p_g to transform any input noise z to a sample from this distribution p_g. From the latent vector z, the function G tries to learn representative features which will help it to create an image. eg. In the case of a currency note, it can be a font, style of a number, or text position.
The discriminator is also a multilayer perceptron, with a differentiable function D and trainable parameters θ_d, that learns to differentiate an input from generator distribution p_g from an input x coming from the real data distribution p_data. The output is a scalar value that defines the probability with which D predicts this input to be real.
Initially, the networks know nothing about real or fake images and thus, it assigns some random probabilities value for all inputs, real or fake.
As the first part of the training, D is trained over a batch of inputs from both the distribution p_data and p_g. The loss from all the inputs is added together and backpropagated through the discriminator so that it can learn better weights and correctly classify real and fake. This is repeated for k steps.
As the second part of the training, G is trained over a batch of inputs from the distribution p_g and the loss is backpropagated to the generator. It is important to note that for generator the loss function is the negative of the discriminator loss function, as the generator wants the fake images to be assigned a value of 1 and not 0, unlike the discriminator. Also, while the generator is being trained, the discriminator parameters are made non-trainable but the gradient back propagates through the discriminator which means the discriminator has to help the generator to learn what mistakes it was making. This is like police disclosing the patterns it uses for differentiating real currency from fake currency, to the counterfeiters.
The training process in steps 2 and 3 is repeated until the optimal state p_g = p_x is achieved.
GANs have shown immense potential in the field of computer vision because of its varied applications. As intelligent machines require a lot of training data, this framework for generating realistic images can help to create training data for many other tasks.
I hope this article was interesting and helpful in understanding GANs.
In part 2, we will discuss in more detail the theoretical analysis and the difficulties in training GANs.
In part 3, we will discuss some GAN applications and code implementation of a popular GAN framework, better known as DCGAN or Deep Convolutional GANs.
Data Science enthusiast with a deep interest in AI and desire to share her learnings in the most simple manner. Has prior software development experience in building windows applications and blockchain applications .