StyleTransfer
Style Transfer
An example below is inspired by original tutorial on TensorFlow, as well as by this blog post. Another good example of Style Transfer using CNTK framework is here. Here is an original paper on Artistic Style Transfer.
Main ideas behind style transfer are the following:
- Starting from white noise, we try to optimize the current image to minimize some loss function
- Loss function consists of three components
- - content loss - shows how close the current image is to original image
- - style loss - shows how close the current image is to style image
- - total variation loss (we will not consider it in our example) - makes sure that the resulting image is smooth, i.e. it shows the mean squared error of neighbouring pixels of the image
Those loss functions have to be designed in a clever way, so that for example style loss corresponds to styles of the images being similar, and not the actual content. For that, we will compare some deeper feature layers of a CNN which looks at the image.
Let's start by loading a couple of images:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 210k 100 210k 0 0 2670k 0 --:--:-- --:--:-- --:--:-- 2670k
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 131k 100 131k 0 0 459k 0 --:--:-- --:--:-- --:--:-- 459k
Let's load those images and resize them to . Also, we will generate resulting image img_result as a random array.
To calculate style loss and content loss, we need to work in the feature space extracted by a CNN. We can use different CNN architectures, but for simplicity in our case we will chose VGG-19, pre-trained on ImageNet.
Let's have a look at the model architecture:
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, None, None, 3)] 0
block1_conv1 (Conv2D) (None, None, None, 64) 1792
block1_conv2 (Conv2D) (None, None, None, 64) 36928
block1_pool (MaxPooling2D) (None, None, None, 64) 0
block2_conv1 (Conv2D) (None, None, None, 128) 73856
block2_conv2 (Conv2D) (None, None, None, 128) 147584
block2_pool (MaxPooling2D) (None, None, None, 128) 0
block3_conv1 (Conv2D) (None, None, None, 256) 295168
block3_conv2 (Conv2D) (None, None, None, 256) 590080
block3_conv3 (Conv2D) (None, None, None, 256) 590080
block3_pool (MaxPooling2D) (None, None, None, 256) 0
block4_conv1 (Conv2D) (None, None, None, 512) 1180160
block4_conv2 (Conv2D) (None, None, None, 512) 2359808
block4_conv3 (Conv2D) (None, None, None, 512) 2359808
block4_pool (MaxPooling2D) (None, None, None, 512) 0
block5_conv1 (Conv2D) (None, None, None, 512) 2359808
block5_conv2 (Conv2D) (None, None, None, 512) 2359808
block5_conv3 (Conv2D) (None, None, None, 512) 2359808
block5_pool (MaxPooling2D) (None, None, None, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
Let's define a function that will allow us to extract intermediate features from VGG network:
Content Loss
Content loss will show how close our current image is to the original image. It looks at the intermediate feature layers in CNN, and computes square error. Content loss on layer will be defined as
where and -- features at layer .
Now we will implement the main trick of style transfer - optimization. We will start with random image, and then use TensorFlow optimizer to adjust this image to minimize content loss.
Important: in our case, all computations are performed using GPU-aware TensorFlow framework, which allows this code to run much more efficiently on GPU.
Exercise: Try experimenting with different layers in the network and see what happens. You can also try optimizing for several layers together, but you would have to change the code for
content_lossa bit.
Style Loss
Style loss is the main idea behind Style Transfer. We compare not the actual features, but their Gram matrices, which are defined as
Gram matrix is similar to correlation matrix, and it shows how some filters depend on the others. Style Loss is computed as a sum of losses from different layers, which are often considered with weighted coefficients.
Total loss function for style transfer is a sum of content loss and style loss.
Putting it all together
We will define total_loss function that will calculate combined loss, and run the optimization:
The code below performs the actual optimization of loss. Keep in mind that even with GPU the optimization takes significant amount of time. You can run the cell below several times to improve the result.
Add variation loss
Variation loss allows us to make the image less noisy, by minimizing the amount of difference between neighbouring pixels.
We will also start optimization from the original content image, which allows us to keep more content details in the image, without complicating content loss function. We will add some noise, though.
(256, 256, 3)
True