GANPyTorch

artificial-intelligence10-GANsrnnganmicrosoft-for-beginnerslessonsAImicrosoft-AI-For-Beginnersmachine-learningdeep-learning4-ComputerVisioncomputer-visioncnnNLP

Genereative Adversarial Networks

The main goal of Generative Adversarial Network (GAN) is to generate images that are similar (but not identical) to training dataset.

GAN consists of two neural networks that are trained against each other:

  • Generator takes a random vector, and should generate an image from it
  • Discriminator is a networks that should distinguish between original image (from training dataset), and the one generated by the generator.
[274]
[275]

Generator

The role of a generator is to take a random vector of some size (it is similar to latent vector in autoencoders) and generate the target image. It is very similar to the generative side of autoencoder.

In our example, we will use linear neural networks and MNIST dataset.

[276]

A few tricks used in generator:

  • Instead of ReLU, we use LeakyReLU, i.e. a ReLU which is not exactly 0 for negative xx, but rather another linear function with very small slope.
  • We use BatchNorm1D in order to stabilize training
  • The activation function on last layer is Tanh, so the output is in the range [-1,1].

Discriminator

Discriminator is a classical image classification network. In our first example, we will also use linear classifier.

[277]

Loading dataset

We will use MNIST dataset.

[278]
[279]
[280]
[281]

Network training

On each step of the training, we have two phases:

  • Generator training. We generate some random vectors noise (training happens in minibatches, so we use 100 vectors at a time), generate true labels (vector with shape (bs, 1) with 1.0 values), calculate generator loss between output from frozen discriminator with noise as input and true labels.

  • Discriminator training. We calculate discriminator loss from two parts, first part is loss between output from discriminator with noise as input and fake labels (vector with shape (bs, 1) with 0.0 values), second part is loss between output from discriminator with real images as input and true labels (vector with shape (bs, 1) with 1.0 values). Result loss is (first_part_loss + second_part_loss) / 2.

[282]
[283]
[284]
[285]
  0%|          | 0/100 [00:00<?, ?it/s]
Output
 10%|█         | 10/100 [01:32<13:49,  9.21s/it, generator loss:=1.04, discriminator loss:=0.547]
Output
 20%|██        | 20/100 [03:05<12:18,  9.23s/it, generator loss:=0.935, discriminator loss:=0.596]
Output
 30%|███       | 30/100 [04:38<10:47,  9.24s/it, generator loss:=0.9, discriminator loss:=0.617]  
Output
 40%|████      | 40/100 [06:11<09:16,  9.28s/it, generator loss:=0.851, discriminator loss:=0.634]
Output
 50%|█████     | 50/100 [07:45<07:49,  9.38s/it, generator loss:=0.839, discriminator loss:=0.638]
Output
 60%|██████    | 60/100 [09:19<06:21,  9.54s/it, generator loss:=0.836, discriminator loss:=0.637]
Output
 70%|███████   | 70/100 [10:52<04:38,  9.27s/it, generator loss:=0.844, discriminator loss:=0.636]
Output
 80%|████████  | 80/100 [12:27<03:05,  9.30s/it, generator loss:=0.853, discriminator loss:=0.634]
Output
 90%|█████████ | 90/100 [13:59<01:31,  9.19s/it, generator loss:=0.854, discriminator loss:=0.633]
Output
 99%|█████████▉| 99/100 [15:23<00:09,  9.29s/it, generator loss:=0.866, discriminator loss:=0.629]
Output
100%|██████████| 100/100 [15:32<00:00,  9.33s/it, generator loss:=0.865, discriminator loss:=0.629]

DCGAN

Deep Convolutional GAN is pretty obvious idea of using convolutional layers for generator and discriminator. The main difference here is using Conv2DTranspose layer in the generator.

Image from this tutorial

[286]
[287]

Weights initialization from DCGAN paper.

[288]
[289]
[290]
[291]
[292]
[293]
[294]
  0%|          | 0/50 [00:00<?, ?it/s]
Output
 10%|█         | 5/50 [00:53<08:03, 10.74s/it, generator loss:=2.38, discriminator loss:=0.256]
Output
 20%|██        | 10/50 [01:47<07:03, 10.58s/it, generator loss:=2.12, discriminator loss:=0.254]
Output
 30%|███       | 15/50 [02:40<06:12, 10.66s/it, generator loss:=2.2, discriminator loss:=0.255] 
Output
 40%|████      | 20/50 [03:33<05:17, 10.57s/it, generator loss:=2.21, discriminator loss:=0.266]
Output
 50%|█████     | 25/50 [04:27<04:24, 10.60s/it, generator loss:=2.34, discriminator loss:=0.229]
Output
 60%|██████    | 30/50 [05:20<03:32, 10.65s/it, generator loss:=2.42, discriminator loss:=0.225]
Output
 70%|███████   | 35/50 [06:13<02:39, 10.60s/it, generator loss:=2.55, discriminator loss:=0.216]
Output
 80%|████████  | 40/50 [07:07<01:46, 10.62s/it, generator loss:=2.62, discriminator loss:=0.207]
Output
 90%|█████████ | 45/50 [08:00<00:52, 10.55s/it, generator loss:=2.8, discriminator loss:=0.178] 
Output
 98%|█████████▊| 49/50 [08:43<00:10, 10.65s/it, generator loss:=2.77, discriminator loss:=0.2]  
Output
100%|██████████| 50/50 [08:54<00:00, 10.68s/it, generator loss:=2.79, discriminator loss:=0.184]
[304]
Output

Task: Try generating more complex color images with DCGAN - for example, take one class from CIFAR-10 dataset.

Training on Paintings

One of the good candidates for GAN training are paintings created by human artists.

(Photo from Art of Artificial collection)