One of the fundamental challenges in the field of image processing and computer vision is image de-noising, where the underlying goal is to estimate the original image by suppressing noise from a noise-contaminated version of the image. Image noise may be caused by different intrinsic (i.e., sensor) and extrinsic (i.e., environment) conditions which are often not possible to avoid in practical situations.
Image de-noising plays an important role in computer vision tasks like image restoration, image segmentation, and even classification problems, where obtaining the original image content is crucial for strong performance.
In this blog, we will try to build a simple auto-encoder network in PyTorch to de-noise images.
What is an Auto-encoder?
An autoencoder is a model where we pass in some original data, eg. an image, and then the data is compressed to a smaller size. This mapping between the original and compressed image is learnt.
An autoencoder in simple terms will try to compress an image, and then try to reconstruct the image from the compressed representation.
Thus the model network comprises of two parts, the encoder, for generating a compressed feature vector for the given image, and a decoder for reconstructing the image back.
In a Convolutional Autoencoder, the encoder portion will be made of convolutional and pooling layers and the decoder will be made of transpose convolutional layers that learn to “upsample” a compressed representation.
For the image de-noiser , we will be using a convolutional neural network, so let’s dive deeper into the structure of that.
The structure is made up of convolution and transpose convolution layers.
The encoder part of the network will be a typical convolutional pyramid. Each convolutional layer will be followed by a max-pooling layer to reduce the dimensions of the layers.
The decoder though might be something new to you. The decoder needs to convert from a narrow representation to a wide, reconstructed image.
Taking the example of an image from the MNIST dataset, we know that the dimension for the image is 28x28x1. (1 is for the depth or no of color channels).
Let’s take a simple example of an autoencoder model.There are two convolution layers, each followed by a maxpool layer with a kernel and stride of (2,2). The decoder has two transpose convolution layers.
If an image from the MNIST dataset is passed through this network. We all know the image dimension is 28x28x1 ( 1 is the depth or no of color-channels). The representation could be a 7x7x4 max-pool layer. This is the output of the encoder, but also the input to the decoder. We want to get a 28x28x1 image out from the decoder so we need to work our way back up from the compressed representation.Here our final encoder layer has size 7x7x4 = 196. The original images have size 28×28 = 784, so the encoded vector is 25% the size of the original image.
So, I guess you got an understanding of the basic structure of the autoencoder, now let’s see how we can use the same to remove noise, from images.
Autoencoder to de-noise images:
Autoencoders can learn pretty well how to de-noise images,given a noisy and non-noisy set to learn from.The idea is that given an input set of noisy data, and a target set of non-noisy image data, the encoder can learn to distill important information from the noisy image and the decoder can learn to produce a non-noisy reconstruction.
Once trained, this will be able to de-noise new images.For this example, I will be training and testing it on MNIST data only, however similar network structure can be used for de-noising more complex images as well.
Now I have talked enough, let’s try to implement this in code.
Step 1: Loading the necessary libraries and data set
As I said earlier, I am using PyTorch to implement this, so here I am importing the libraries. Then I am loading the MNIST dataset, and transforming them to tensors.
Next, I am batching up this data, and then using the PyTorch’s loader class, I am making the train and test loaders. As you may notice, unlike the typical classification, I haven’t created any validation loader. As here, our main objective should be to minimize the training loss. Validation sets are mostly used when we are trying to predict a quantity like a class.
Step-2: Visualizing the data
Here I have visulaized one of the images, from the MNIST dataset.
And you can see in the output, that this is a 28x28x1 tensor, where 1 is the depth of the color channel, indicating that this is a greyscale image.
Step 3: Making the model network
The encoder layer consists of 3 convolutional layers, each with a kernel size of 3×3 and a padding of 1.The first conv. layer, increases depth from 1 to 32, and like this the depth is again changed from 16 to 4 in the final layer, keeping the x-y dims same across the layers.The pooling layer to reduce x-y dims by two.
The decoder layer is made up of 3 transpose convolutional layers, and a final convolution layer.Transposed convolutional layers to increase the width and height of the input layers. They work almost exactly the same as convolutional layers, but in reverse. A stride in the input layer results in a larger stride in the transposed convolution layer.
Here, I’ve put together everything, and as you can see, I have added a ReLU activation after each and every hidden layer, except the last layer. For the last layer, I have used a sigmoid activation function, which scales the output so that, I’ll have the greyscale image pixel values between 0 and 1.
Step 4: Training the model
First we will need to set the model hyperparameters.
Because we’re comparing pixel values in input and output images, it will be best to use a loss that is meant for a regression task. Regression is all about comparing quantities rather than probabilistic values. So, in this case, I’ll use Mean Squared Error (MSE) loss.I am using an Adam optimizer, which is most suitable for this case. Here I am using only 20 epochs, as its a demo project, for your projects, I will suggest to increase the number of epochs.
Also I have defined a noise factor of 0.5, In this case, I am actually adding some noise to these images and I’ll feed these noisy images to our model. The model will produce reconstructed images based on the noisy input. But, I want it to produce normal un-noisy images, and so, when I calculate the loss, I will still compare the reconstructed outputs to the original images.
As you can see the training loss, is decreasing slowly, and with a deeper convolution network and training for more epochs, we can get an even better performing the model.
Step 5: Test the performance
Here I’m adding noise to the test images and passing them through the autoencoder.
As you all can see, the model works quite well, in removing the speckle noise.However this model can be improved, to get even better results.
Hope you liked the blog.