Cartoon Character, one of the people or animals in an animated film like Mickey Mouse, Tom & Jerry etc. Cartoons are essential part of every childhood. They are, certainly, the most popular entertainment for children, but also much more than that. With the help of cartoons kids can learn about the world around us, about new emotions, life issues and other important things. Hence, just for fun the goal of current project is to recognize the cartoon character using deep learning algorithm.
So, In this article, we will look how to use a Deep Neural Net model for performing Cartoon Character Recognition in OpenCV.
- Dataset
- Deep Learning Algorithm
- Experiments and Results
- What’s Next?
Dataset
The images used in the dataset are collected from google-chrome and from various sites like Disney. The Dataset contains 4 categories ( Mickey-Mouse, Donald-Duck, Minions, and Winnie the Pooh) as of now and it contains a total of 2215 images. The Dataset is converted into structured format (collecting at one place and manually labeling of image) and all preprocessing (like resizing the image, applying filters to remove noise, etc) is done using OpenCV-python.
Deep Learning Algorithm
- Model Architecture: For understanding the CNN Model, please go through the article. The architecture is different w.r.t. a number of layers, parameters, and hyperparameter tuning, but the basics are the same.
- Calculating the number of Parameters in CNNs: If you’ve been playing with CNN’s it is common to encounter a summary of parameters as seen in the below image. We all know it is easy to calculate the activation size, considering it’s merely the product of width, height and the number of channels in that layer.
First, What are parameters?
Parameters in general are weights that are learnt during training. They are weight matrices that contribute to model’s predictive power, changed during back-propagation process. Who governs the change? Well, the training algorithm you choose, particularly the optimization strategy makes them change their values.
Now that you know what “parameters” are, let’s dive into calculating the number of parameters in the sample image we saw above. But, I’d want to include that image again here to avoid your scrolling effort and time.
- Input layer: Input layer has nothing to learn, at it’s core, what it does is just provide the input image’s shape. So no learnable parameters here. Thus number of parameters = 0.
- CONV layer: This is where CNN learns, so certainly we’ll have weight matrices. To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width m, height n, previous layer’s filters d and account for all such filters k in the current layer. Don’t forget the bias term for each of the filter. Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters). Where the term “filter” refer to the number of filters in the current layer.
- POOL layer: This has got no learnable parameters because all it does is calculate a specific number, no backprop learning involved! Thus a number of parameters = 0.
- Fully Connected Layer (FC): This certainly has learnable parameters, matter of fact, in comparison to the other layers, this category of layers has the highest number of parameters, why? because, every neuron is connected to every other neuron! So, how to calculate the number of parameters here? You probably know, it is the product of the number of neurons in the current layer c and the number of neurons on the previous layer p and as always, do not forget the bias term. Thus number of parameters here are: ((current layer neurons c * previous layer neurons p)+1*c).
Now let’s follow these pointers and calculate the number of parameters, shall we?
- The first input layer has no parameters.
- Parameters in the second CONV1(filter shape =3*3, stride=1) layer is: ((shape of width of filter*shape of height filter*number of filters in the previous layer+1)*number of filters) = (((3*3*3)+1)*32) = 896.
- Parameters in the fourth CONV2(filter shape =3*3, stride=1) layer is: ((shape of width of filter * shape of height filter * number of filters in the previous layer+1) * number of filters) = (((3*3*32)+1)*64) = 9248.
- The third POOL1 layer has no parameters.
- Parameters in the fourth CONV3(filter shape =3*3, stride=1) layer is: ((shape of width of filter * shape of height filter * number of filters in the previous layer+1) * number of filters) = (((5*5*32)+1)*64) = 18496.
- Parameters in the fourth CONV4(filter shape =3*3, stride=1) layer is: ((shape of width of filter * shape of height filter * number of filters in the previous layer+1) * number of filters) = (((3*3*64)+1)*64) = 36928.
- The fifth POOL2 layer has no parameters.
- The Softmax layer has ((current layer c*previous layer p)+1*c) parameters = 238144*4 + 1*4= 952580.
1.CNN Model Analysis
2.Implementation of Cartoon Character Recognition for Image.
3.Implementation of Cartoon Character Recognition for Video.
In this article, the cartoon character is recognized accurately. We have used 4 categories, this can be extended to train the model for more categories, and we can create a dataset for object detection (will going to do it just for learning object detection in the future).
- Dataset Collection
- Calculation of CNN Parameters.
- CNN Architecture.
The code is available at github.com/Devashi-Choudhary/Cartoon-Character-Recognition. For any questions or doubts, feel free to contact me directly at github.com/Devashi-Choudhary.