Image classification is one of the implemented techniques using machine learning. This method is also called computer vision that is allowing the computers to have the ability to understand the image and classifying it into corresponding classes. Deep neural networks are subsequent of machine learning that is using neural network for modeling with backpropagation for minimizing a loss function. One of the best deep learning models used for image classification is Convolutional Neural Network (CNN) that is proven to get the highest accuracy possible for image classification. I will compare simple image classification for face recognition with 40 different classes using a dense neural network model with a couple of hidden layers and with a convolutional neural network.
The data I got for this classification is from my classmates in one of my classes. There are 40 people with different names and each name corresponding to classes that will be predicted. The photos were taken with a webcam on an online meeting application platform and each people take 10 photos from a different angle. We are using a specific background image for taking photos to make it more general.
Here is the background image that we used.
The blue horizontal line in the image will help the positioning of the eye. Then after getting the photos, we will crop the image in the black lined square and reshape it to 100 x 100 pixels.
In the end, we got 400 100×100 pixels images from 40 different classes and each class has 10 different images.
I will use Tensorflow-Keras API for separating the data for training and testing, training the data, and modeling the data. After the data is processed we split the data into 70% for training and 30% for validating and testing.
I’m using TensorFlow.Keras.preprocessing.image_dataset_from_directory package to create a dataset from a directory and split the data into 70% and 30%. This will also regenerate the class name from the file directory and from that we got 40 different classes.
The data we got is RGB images. That being said, each image will contain 100x100x3 and each pixel has ranged from 0 to 255. After we got two different datasets that are training dataset and the validation dataset we use MinMaxScaler for processing the data and resulting in the range for each pixel ranged from 0 to 1.
In this part of the section, there are two parts of modeling used, dense neural network and convolutional neural network. First, we will look for a dense neural network model.
Dense Neural Network / Deep Neural Network
A dense Neural Network means each neuron is densely connected to each neuron from the next layer. I will use Tensorflow-Keras sequential to build this dense neural network model. The model has 1 input layer, 4 hidden layers, and 1 output layer.
The input layer is a flatten layer that will take input from 100x100x3 3-dimensional tensor into a 1-dimensional tensor. The first through the last hidden layer is a densely connected layer with 1024 neurons in the first hidden layer, 512 in the second hidden layer, 256 in the third hidden layer, and 128 in the last hidden layer. And lastly is the output layer that is also a dense layer with 40 neurons corresponding to each class for the model.
For each hidden layer in this model, I choose the rectifier linear unit as an activation function and softmax activation in the output layer. And to finish it off, we compile the model with adam optimizer and using the sparse categorical cross-entropy for the loss function.
Here is the model summary.
Convolutional Neural Network
The CNN (Convolutional Neural Network) model is one type of deep learning model that uses the concept of convolution. This model is most commonly applied to image analysis in machine learning. When using CNN, there is a layer called conv2d which takes the input of a tensor and specifies the size of 2–dimensional kernels as a parameter.
Our model takes the first input layer with the image rescaling with the MinMax scaler method so each pixel has a value between 0 and 1. And our model has 3 convolutional layers and each has a filter size of 32 pixels a kernel size of 3×3. Between each convolutional layer, we have a max pool layer for the image sampling that takes that maximum value in the pool_size.
After several convolutional layers, we flatten the image with a flattened layer and resulting 1-dimensional tensor with the size of 3200. Then we use a densely connected layer with 128 neurons and the output layer with 40 neurons each representing the class.
For the activation function, I use a rectifier linear unit in the hidden convolutional layer and hidden dense layer. And using softmax as an activation function in the output layer for making the predictions. And then, we compile the model with adam as an optimizer and sparse categorical cross-entropy for the loss function.
Here is the model summary.
I used a personal computer for training the data using Tensorflow API with GPU enabled. I trained the model with a couple of numbers of epochs and using the validation dataset for validation data. I also measure the time taken for training data using the python time package. Here is the training result of dense neural networks and convolutional neural networks.
DNN
I use 100 epochs for training and using Keras early stopping for callbacks when training the data. The resulting training data is only used 20 epochs and has an accuracy of 97% with validation accuracy of 83%. The time taken for training 20 epochs in this model is around 17.2 seconds.
CNN
For the CNN model, I use 10 epochs for training without using Keras early stopping for callbacks. When the training data finished with 10 epochs it has 99% accuracy and 94% validation accuracy. The time it takes for training data is around 8.15 seconds.
The result for both models shows pretty good results for image classification. Although the DNN model is considered overfitting because of the high accuracy in the training data when I test to test data it manages to classify 27 out of 30 images correctly. For the CNN model, it shows better accuracy for validation data meaning the model is more general and not just memorizing training data. The CNN model manages to classify 29 out of 30 images correctly.
After getting the training result from the DNN and CNN model, I plot the accuracy and loss result with the Matplotlib library. It shows the data reach convergence for both models.
DNN Graph
CNN Graph
After comparing the DNN and CNN model for image classification, we conclude that CNN is the best solution for image classification. Although in this problem, I use a photo with a specific background image and the same size this also makes image classification with DNN is still possible and get a good result. For training data, in this specific problem with a given model, DNN needs 20 epochs for 17 seconds of training time to reach the convergence. And with the CNN model, it just needs 10 epochs for 8 seconds of training time to reach the convergence.