Handwriting Recognition using Machine Learning Techniques

PREPROCESSING-1

Preprocessing is an essential step in any dataset where images are involved. For preprocessing the data we employ the above methods discussed above. In detail the steps that we perform are the following :-

Remove the portion of the images which are common to all the images i.e the computer generated text having words like ”NOM” and ”PRENOM”.
Crop the image to only have the portion of the image which contains the names written by a human and nothing else.
Noise in the form of random half text in some of the images in the sides, bottom or top.

The better we preprocess and work in the data the better results could be expected from out model. This is highly laborious task to figure out which ofthe images and their labels could be finally usable for the machine learning model.

Sample Images after performing the basic preprocessing

The images look much better now.

PREPROCESSING-2

Now our goal would be to separate out each of the characters from the images and obtain their labels. This is the primary step before applying the machine learning algorithms.

To generate each character image we follow the following steps:-

Identifying the clusters: First we need to find the points which are connected in the image. That would break the image into clusters. By clusters we mean the large groups of connected points. An example of cluster is given in Figure 3.
Drawing the Bounding Box: After we have figured out the clusters we are going to find the largest square that would fit the cluster. We call that as bounding box. We return the end co ordniate of the bounding box.
Cropping the image: After we got the coordinate of the bounding box we need to crop out the part present inside the box.
Resizing each cluster: We need to resize the clusters into some standard size (28×28) so that every image is of same size.

The word “ROBIN” whose before and after is shown below :

A sample from the Dataset labeled “Robin”

Separated Character images of the word

label = [‘R’, ‘O’, ‘B’, ‘I’, ‘N’]

So we have 5 letters each of size 28×28 pixels and their labels as follows. This way we convert all the words/names in the dataset into this format.

Word of Caution

This automated way of converting is not always accurate. Since some letters might be connected in a way such that they are not separated well and there can be mismatches between the number of images and the number of characters in the labels. We perform a run through our created images and labels in order to check the same and eliminate the entire image and label if it comes out to be different.

Feature Extraction

We can now perform some feature extraction on the images. We use PCA(Principal Component Analysis) and HOG(Histogram of Gradients) in order to obtain good features.

Fitting the Data into Models

Here we have used 3 algorithms namely:-

Support Vector Machine– The image first needs to be flattened in order to process it in SVM. The 28×28 images is flattened to an array of 784 pixels. Therefore, we have a 784 features to handle. So we apply feature extraction methods to reduce the number of features. Two of the methods used here for getting the features are Histogram of Gradients(HOG) and PCA(Principal Component Analysis).In SVM we used a radial basis function (rbf) kernel with c = 1.0 and gamma to ‘scale’. we have used One vs One for multiclass classification.
Multi Layer Perceptron– We have also tried multilayer perceptron or MLP classifier with hidden layer of [300,400,150] and we have used hog and pca for feature extraction . Hog gave us a better accuracy both classwise and overall in case of MLP classifier. we used the implementation of MLP in sklearn.
Convolutional Neural Network- We have used a CNN architecture that we trained from scratch using 20000 images belonging to 26 classes of characters. The architecture of CNN is given in Figure 4. We have used 3 Layers of convolution and maxpool layer each having a kernel size of (3,3) [for convolution] and we have used padding because the image was a bit small in size and we didn’t want to loose valuable information in the images. The CNN were self sufficient for feature extraction and we didn’t had to do any external feature extraction. After the convolution and max pool layer we used two fullly connected layers with first one activated by relu and second by softmax as it is a case of multiclass classification.

CNN Architecture

PIPELINE

Pipeline

This is the final pipeline that we follow to build our models for handwriting Recognition.

RESULTS

The different accuracy obtained from each of the classifiers have been shown. From the results that we found we can see that the classical machine learning models and the deep learning models were performing closely in case of overall character wise accuracy , but CNN performed much better in case of word wise accuracy overall. So there is a clear-cut victory of deep learning models to achieve the target which we wanted. But the classical machine learning models were not far behind. The reason behind the closeness might be the small size of the images , thereby less features to handle.

CONCLUSION

Handling image data is much more labour intensive than working on a regression/classification problem where the data is almost ready to be fitted into a model. It requires a lot of preprocessing for it to be ready for use. In our work till date we’ve spend most of our time on perfecting the dataset inorder for it to be usable in different ML models. We had to spend quite a bit of time handling large number of images and preprocessing them to be ready for the models to train on. The steps of the preprocessing have been extensively shown above. Once we have the images ready for processing in the form of 28×28 images we experimented with different ML models and see their performances. We compared and contrasted a classical machine learning model SVM with a CNN. The CNN comes out on top. MLP and SVM perform at par.

Authors and Contribution

The project has been implemented by

Imankalyan Sarkar(Imankalyan Sarkar | LinkedIn)
Pallab Chakraborty (Pallab Chakraborty | LinkedIn)
Sudeep Vig(Sudeep Vig | LinkedIn)

They are students of IIIT Delhi MTech Batch of 2020.

Imankalyan Sarkar was responsible for implementing the neural network models CNN and MLP. Pallab Chakraborty and Sudeep Vig did the preprocessing work and implemented the SVM model.

OUR GUIDE

Professor: Dr. Tanmoy Chakraborty (http://faculty.iiitd.ac.in/~tanmoy/)

LinkedIn: https://www.linkedin.com/in/tanmoy-chakraborty-89553324/

Twitter: @Tanmoy_Chak

Facebook: https://www.facebook.com/chak.tanmoy

Our Teaching Fellow and Teaching Assistants

Teaching Fellow: Ms. Ishita Bajaj
Teaching Assistants: Pragya Srivastava, Shiv Kumar Gehlot, Chhavi Jain, Vivek Reddy, Shikha Singh, and Nirav Diwan.

We owe a Thanks to all of them for guiding us to learn and build the models.

#MachineLearning2020 #IIITD

Footer