Recently i moved in a new apartment and i had no decoration, and i didn’t want to just use random posters from Amazon, i wanted to create something myself. The only issue is, i am not the most artistic person out there .
So this got me thinking on how to use my knowledge on machine learning to decorate my apartment hence Neural Style Transfer and as a datascientist and and general art lover this is like the best mix of both worlds.
Deep Learning Neural Style Transfer consist on capturing the content of one image and combine it with the style of another image.
But, how does Neural Style Transfer works? In this post, we are going to look into the underlying mechanism of Neural Style Transfer (NST).
The principle of neural style transfer is to define two distance functions, one that describes how different the content of two images are, Lcontent, and one that describes the difference between the two images in terms of their style, Lstyle. Then, given three images, a desired style image, a desired content image, and the input image (initialized with the content image), we try to transform the input image to minimize the content distance with the content image and its style distance with the style image.
In summary, we’ll take the base input image, a content image that we want to match, and the style image that we want to match. We’ll transform the base input image by minimizing the content and style distances (losses) with backpropagation, creating an image that matches the content of the content image and the style of the style image.
Different Layers of Convolutional Neural Network
Now, at Layer 1 using 32 filters the network may capture simple patterns, say a straight line or a horizontal line which may not make sense to us but is of immense importance to the network, and slowly as we move down to Layer 2 which has 64 filters, the network starts to capture more and more complex features it might be a face of a dog or wheel of a car. This capturing of different simple and complex features is called feature representation.
Important thing to not here is that CNNs does not know what the image is, but they learn to encode what a particular image represents. This encoding nature of Convolutional Neural Networks can help us in Neural Style Transfer. Let’s dive a bit more deeper.
VGG19 network is used for Neural Style transfer. VGG-19 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 19 layers deep and trained on millions of images. Because of which it is able to detect high-level features in an image.
Now, this ‘encoding nature’ of CNN’s is the key in Neural Style Transfer. Firstly, we initialize a noisy image, which is going to be our output image(G). We then calculate how similar is this image to the content and style image at a particular layer in the network(VGG network). Since we want that our output image(G) should have the content of the content image(C) and style of style image(S) we calculate the loss of generated image(G) w.r.t to the respective content(C) and style(S) image.
Having the above intuition, let’s define our Content Loss and Style loss to randomly generated noisy image.
In order to access the intermediate layers corresponding to our style and content feature maps, we get the corresponding outputs by using the Keras Functional API to define our model with the desired output activations.
With the Functional API, defining a model simply involves defining the input and output: model = Model(inputs, outputs)
.
Content Loss :
Calculating content loss means how similar is the randomly generated noisy image(G) to the content image(C).In order to calculate content loss :
Assume that we choose a hidden layer (L) in a pre-trained network(VGG network) to compute the loss.Therefore, let P and F be the original image and the image that is generated.And, F[l] and P[l] be feature representation of the respective images in layer L.Now,the content loss is defined as follows:
Style Loss :
Similarly to the content loss, style loss is also defined as a distance between two images. However, in order to apply a new style, style loss is defined as a distance between a style image and an output image.
If you aren’t familiar with gradient descent/backpropagation or need a refresher, you should definitely check out this awesome resource.
To compute the gradient, we use tf.GradientTape. It records the operations during the forward pass and then is able to compute the gradient of our loss function with respect to our input image for the backwards pass. Then computing the gradients is easy:
Et voila !
Here are some posters that i added to decorate my room as an example :