Let’s dive into the project, first open a new project using Jupyter Notebook or any other environment you like. First and foremost, importing the libraries
import tensorflow as tf
import cv2
import os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
OpenCV-Python is a library of Python bindings designed to solve computer vision problems.cv2.imread()
method loads an image from the specified file. If the image cannot be read (because of the missing file, improper permissions, unsupported or invalid format) then this method returns an empty matrix. We can read the image through the following line of code:
img_array = cv2.imread(‘train/0/Training_3908.jpg’)
To check the size of the image, we use:
img_array.shape
The matplotlib function imshow() creates an image from a 2-dimensional numpy array. The image will have one square for each element of the array. The color of each square is determined by the value of the corresponding array element and the color map used by imshow().
plt.imshow(img_array)
Now we’ll create a variable containing the directory name and a list which will contain the names of the folders inside that directory. In my case, I have renamed the folder according to the emotion labels.
Datadirectory = "Training/"
Classes = ["0","1","2","3","4","5","6"]
The ImageNet dataset contains images of fixed size of 224*224 and have RGB channels but as fer2013 has images of size 48*48 so we’ll have to resize the images. To resize an image, OpenCV provides cv2.resize() function. cv2.cvtColor() method is used to convert an image from one color space to another.
img_size = 224
new_array = cv2.resize(img_array, (img_size, img_size))
plt.imshow(cv2.cvtColor(new_array, cv2.COLOR_BGR2RGB))
plt.show()
The reason behind executing the above code is that we’re using transfer learning so for transfer learning if we want to use any deep learning classifier then these dimensions must be same. Now we’ll read all the images and will convert them to array.
training_Data = []
def create_training_Data():
for category in Classes:
path = os.path.join(Datadirectory, category)
class_num = Classes.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img))
new_array = cv2.resize(img_array, (img_size, img_size))
training_Data.append([new_array, class_num])
except Exception as e:
pass
Let’s call the function:
create_training_Data()
To make our deep learning architecture dynamic and robust, let’s shuffle the sequence:
import random
random.shuffle(training_Data)
Let’s separate the features and labels. We’ll use deep learning architecture MobileNet which takes 4 dimensions, so we’ll reshape the features list.
X = []
y = []
for features, label in training_Data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, img_size, img_size, 3)
# 3 is the channel for RGB
Among the best practices for training a Neural Network is to normalize your data to obtain a mean close to 0. Normalizing the data generally speeds up learning and leads to faster convergence. Let’s normalize the data before training
X =X/255.0
Now we’ll train our deep learning model using transfer learning
from tensorflow import keras
from tensorflow.keras import layers
Keras Applications are deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning. Here is a chart of some available models.
Now we’ll use MobileNetV2
model = tf.keras.applications.MobileNetV2()
Let’s change the base input
base_input = model.layers[0].input
As we want seven classes, so let’s cut down output
base_output = model.layers[-2].output
The dense layer is a neural network layer that is connected deeply, which means each neuron in the dense layer receives input from all neurons of its previous layer. The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold. Here we’re using relu as activation function.
final_output = layers.Dense(128)(base_output)
final_output = layers.Activation(‘relu’)(final_output)
final_output = layers.Dense(64)(final_output)
final_output = layers.Activation(‘relu’)(final_output)
final_output = layers.Dense(7, activation=’softmax’)(final_output)
Let’s create our new model.
new_model = keras.Model(inputs = base_input, outputs = final_output)
Compile defines the loss function, the optimizer, and the metrics. That’s all. It has nothing to do with the weights and you can compile a model as many times as you want without causing any problem to pre-trained weights. You need a compiled model to train (because training uses the loss function and the optimizer).
new_model.compile(loss=”sparse_categorical_crossentropy”, optimizer = “adam”, metrics = [“accuracy”])
Trains the model for 25 number of epochs.
new_model.fit(X, Y, epochs = 25)
Here is the code to save the model.
new_model.save(‘Final_model_95p07.h5’)
The below code is to test it using a live webcam.
import cv2 # pip install opencv-python
#pip install opencv-contrib-python full package
#from deepface import DeepFace #pip install deepface
path = "haarcascade_frontalface_default.xml"
font_scale = 1.5
font = cv2.FONT_HERSHEY_PLAIN#set the rectangle background to white
rectangle_bgr = (255, 255, 255)
#make a black image
img = np.zeros((500, 500))
#set some text
text = "Some text in a box!"
# get the width and height of the text box
(text_width, text_height) = cv2.getTextSize(text, font, fontScale=font_scale, thickness=1)[0]
# set the text start position
text_offset_x = 10
text_offset_y = img.shape[0] - 25
#make the coords of the box with a small padding of two pixels
box_coords = ((text_offset_x, text_offset_y), (text_offset_x + text_width + 2, text_offset_y - text_height - 2))
cv2.rectangle(img, box_coords[0], box_coords[1], rectangle_bgr, cv2.FILLED)
cv2.putText(img, text, (text_offset_x, text_offset_y), font, fontScale=font_scale, color=(0, 0, 0), thickness=1)cap = cv2.VideoCapture(1)
# Check if the webcam is opened correctly
if not cap.isOpened():
cap = cv2.VideoCapture(0)
if not cap.isOpened():
raise IOError("Cannot open webcam")while True:
ret, frame = cap.read()
#eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
#print(faceCascade.empty())
faces = faceCascade.detectMultiScale(gray,1.1,4)
for x,y,w,h in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
facess = faceCascade.detectMultiScale(roi_gray)
if len(facess) == 0:
print("Face not detected")
else:
for (ex,ey,ew,eh) in facess:
face_roi = roi_color[ey: ey+eh, ex:ex + ew] ## cropping the facefinal_image = cv2.resize(face_roi, (224,224))
font = cv2.FONT_HERSHEY_SIMPLEXPredictions = new_model.predict(final_image)font_scale = 1.5
final_image = np.expand_dims(final_image,axis=0) ## need fourth dimension
final_image = final_image/255.0
font = cv2.FONT_HERSHEY_PLAINif(np.argmax(Predictions)==0):
status = "Angry"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))elif (np.argmax(Predictions)==1):
status = "Disgust"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))elif (np.argmax(Predictions)==2):
status = "Fear"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))elif (np.argmax(Predictions)==3):
x1,y1,w1,h1 = 0,0,175,75
status = "Happy"
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))elif (np.argmax(Predictions)==4):
status = "Sad"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))elif (np.argmax(Predictions)==5):
status = "Surprise"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))else:
status = "Neutral"x1,y1,w1,h1 = 0,0,175,75
#Draw black background rectangle
cv2.rectangle(frame, (x1, x1), (x1 + w1, y1 + h1), (0,0,0), -1)
#Addd text
cv2.putText(frame, status, (x1 + int(w1/10), y1 + int(h1/2)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(frame, status,(100,150),font, 3,(0, 0, 255),2,cv2.LINE_4)cv2.rectangle(frame, (x,y), (x+w, y+h), (0, 0, 255))cv2.imshow('Face Emotion Recognition', frame)if cv2.waitKey(2) & 0xFF == ord('q'):
breakcap.release()
cv2.destroyAllWindows()