Recently I’ve gotten into the idea of using machine learning to recognize microexpressions on people’s faces. On my journey to learning how to make an algorithm myself, I thought id start off at the basics and make an algorithm that can detect normal macro expressions.
Using the CK+48 dataset found here, I was able to replicate a model that reached a top accuracy level of 66%. This accuracy isn’t the greatest due to the small size of the dataset and I plan on trying to improve this in the future, but for now, creating this model was a good learning experience.
So here we go, how to detect facial expressions using PyTorch, line by line.
Import the needed modules
import numpy as np
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
Here we are importing the necessary modules we need to make the model. import numpy as np imports the NumPy library to create and operate with matrices. import torch imports the pytorch library which has several functions to create networks with much less code. from torch import nn imports the nn function from the torch library, which allows for the easy creation of layers for our network. from torch import optim imports the optim function which allows us to easily access error decreasing functions. import torch.nn.functional as F imports the torch.nn.Functional function that can be called upon with the command “F.” from torchvision import datasets, transforms, models imports the necessary libraries from the torchvision module. datasets helps us turn our images into datasets that can be operated with, transforms give us the different transforms we can make to our images, and models gives us access to the pre-trained convolutional networks from the torchvision website.
Upload the dataset to google collab
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length}
bytes'.format(name=fn, length=len(uploaded[fn])))
from google.colab import files imports the files library on google colab. uploaded = files.upload() create a variable for the dictionary of files uploaded. for fn in uploaded.keys(): print(‘User uploaded file “{name}” with length {length} bytes’.format(name=fn, length=len(uploaded[fn]))) shows the name and size of the file that was uploaded. After running this code an “Upload” button will pop up which will allow you to access your zip file.
Extract the compressed data file
from zipfile import ZipFile
file_name = "Emotions.zip"
with ZipFile(file_name, 'r') as zip:
zip.extractall()
print("Done")
from zipfile import ZipFile imports the module ZipFile. file_name = “Emotions.zip” creates a variable called “file_name” with the name of the file as its value “Emotions.zip.” with ZipFile(file_name, ‘r’) as zip: finds the path to file_name and refers to this file as zip. zip.extractall() extracts all the files in zip. print(“Done) is used to tell us when the extraction is done.
Transform the data
test_dir = '/content/Emotions/CK+48/test'
train_dir = '/content/Emotions/CK+48/train'train_transforms = transforms.Compose([transforms.RandomRotation(30),transforms.RandomResizedCrop(100),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize([0.5,0.5,0.5],[0.5,0.5,0.5])])test_transforms = transforms.Compose([transforms.Resize(255), transforms.CenterCrop(224),transforms.ToTensor()])
The top two lines assign the pathway to the test data folder and the train data folder their corresponding test and train directories. The second line creates a variable consisting of all the transformations that will be applied to the photos in the training set. RandomRotation(30) rotates each photo randomly in intervals of 30*. RandomResizedCrop(100) crops photo to side lengths of 100 units. RandomHorizontalFlip() will randomly flip some of the photos along its horizontal axis. ToTensor() turns the photos into tensors so that they can later be computed by the model. Normalize([0.5,0.5,0.5],[0.5,0.5,0.5])]) helps convert the color values of each pixel into a value between -1 and 1, so they can more easily be processed by the model. The last line creates a variable consisting of all the transformations that will be applied to the photos in the testing set. Resize(255) resizes all the photos with side lengths of 255 units. CenterCrop(224) crops the photos and centers it 16 pixels from the top, bottom, left, and right.
Create computable datasets
train_data = datasets.ImageFolder(train_dir, transform=train_transforms)
trainloader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)test_data = datasets.ImageFolder(test_dir, transform=test_transforms)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=True)
The first line creates a variable for the training data that consists of the images in the train folder and applies the previously mentioned transformations to them. The second line splits this newly transformed training data into batches with batch sizes of 32 images each and the order that the photos in each batch are fed into the model are shuffled for each iteration. The last two lines do the same thing as the first two but with the testing data.
Loading in our pre-trained model
model = models.alexnet(pretrained=True)for param in model.parameters():
param.requires_grad = False
The first line is using the “models” library imported at the beginning to upload the pre-trained convolutional network “alexnet” into the code. pretrained=True just means that the network is pre-trained. The network comes from torchvision, a website where you can find pre-trained networks. They have been trained on thousands of photos and are good for general-purpose image recognition.
The last two lines iterate through each parameter (or layer) in the network and turn off the backpropagation functions for each of them with the param.requires_grad = False command. Since these parameters are already pre-trained we don’t want the data we add in to affect them.
Create and add a classifier to the model
classifier = nn.Sequential(nn.Linear(9216, 1000), nn.ReLU(), nn.Dropout(0.2), nn.Linear(1000, 5), nn.LogSoftmax(dim=1))model.classifier = classifier
The first line defines a classifier with five layers. nn.Linear(9216, 1000) is a normal layer that takes 9216 inputs and outputs 1000 values. nn.ReLU() is a function used to turn the 1000 output values into either a 1 or 0. nn.Dropout(0.2) is a function that turns off nodes in the layers with each node having a 20% chance of being turned off. This helps prevent overfitting by forcing the model to generalize more. nn.Linear(1000, 5) is another normal layer that takes 1000 inputs and spits out 5 outputs, one for each class of emotions we are trying to classify. nn.LogSoftmax(dim=1) sends each of the five outputs through a softmax function which turns the input into some value between 0 and 1, each representing the probability of that image being in each class of emotions.
The last line just replaces the original classifier of the alexnet model with the classifier we just created.
Create loss and backpropagation functions
criterion = nn.NLLLoss()optimizer = optim.SGD(model.classifier.parameters(), lr=0.003)
criterion = nn.NLLLoss() will create a variable called criterion in which when commanded, will compute the Negative Log-Likelihood Loss on whatever parameters are given. This helps determine the magnitude of how wrong the model’s predictions are by comparing the values of the predictions with the values of the actual image in each class.
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003) creates a variable for the backpropagation function, in this case its the Adam function. This function is applied to all the parameters or layers in the classifier variable we previously made. The learn rate, lr=0.003, is set to 0.003 which determines the magnitude at which the function changes the weights and biases of the nodes.
Create variables for training
epochs = 8
steps = 0
running_loss = 0
print_every = 5
epochs are the number of times all the data gets fed forward and backward through the model once. steps is the number of batches ran through the training model. This is set to zero as no batches have been fed forward yet. running_loss is the added up value of the outputs from the loss function or criterion established earlier, during the training phase. print_every is a value set to determine after how many steps to do a test run on the model.
Create the training loop
for epoch in range(epochs):
for inputs, labels in trainloader:
steps += 1
optimizer.zero_grad()
logps = model.forward(inputs)
loss = criterion(logps, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
for epoch in range(epochs): runs the training loop as many times as the value of epochs says, in this case, 8 times. for inputs, labels in trainloader: iterates through each input and corresponding label in trainloader, our training dataset. steps += 1 adds 1 point to our “steps” variable everytime a batch is sent through the training loop.
Due to the optimizer’s natural tendency to accumulate graidents with each input rather than rest them, optimizer.zero_grad() is used to reset the values of the gradient of each node during the backpropagation step. logps = model.forward(inputs) feeds the inputs from our training batch forward through our model and creates a variable of the outputs called “logps”. loss = criterion(logps, labels) calculates the loss of each image in the batch using the criterion function we made earlier. loss.backward() calculates the derivative of the outputs from the loss function relative to the original inputs being sent through the classifier. optimizer.step() updates and changes the weights and biases of each node based on the derivatives given to each node from the loss.backward() function. running_loss += loss.item() adds the loss value of each input to our “running_loss” variable so that we can determine the total training loss of the batch. This should get smaller over time.
Creating testing loop conditions and variables
if steps % print_every == 0:
test_loss = 0
accuracy = 0
model.eval()
The top line makes it so that the succeeding code is executed when the “steps” divided by “print_every” has a remainder of 0. Essentially every five steps testing will be done.
test_loss = 0 is the same as the running_loss but specifically for the testing data. accuracy = 0 just creates a variable for the accuracy of the model that starts out at 0. model.eval() puts the model into evaluation mode.
Creating the testing loop
NOTE: The code below falls under the if statement above.
with torch.no_grad():
for inputs, labels in testloader:
logps = model.forward(inputs)
batch_loss = criterion(logps, labels)
test_loss += batch_loss.item()
torch.no_grad() stops backpropagation from occurring with the testing data. This is important because we aren’t trying to train the model here but rather test the accuracy of the model at its current state. for inputs, labels in testloader: iterates through the inputs and corresponding labels in the testloader, our testing dataset. logps = model.forward(inputs) feeds the testing data inputs forward through the current model. batch_loss = criterion(logps, labels) calculates the loss for the testing batch. test_loss += batch_loss.item() adds the loss of each input in the batch to our “test_loss” variable.
Checking the accuracy of our model
ps = torch.exp(logps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
ps = torch.exp(logps) calculates the probability of an image being in each of the five emotion classes. top_p, top_class = ps.topk(1, dim=1) assigns a variable “top_class” to the class that has the highest probability based on the previous line. equals = top_class == labels.view(*top_class.shape) checks to see if the class with the highest probability is equal to the class that the image is actually classified as. accuracy += torch.mean(equals.type(torch.FloatTensor)).item() turns the “equals” variable into a float tensor (because before it was a byte tensor) and then takes the average of the correct predictions over the total predictions and adds this value to our accuracy.
Printing the results and setting the model back into train mode
print(f"Epoch {epoch+1}/{epochs}.. "
f"Train loss: {running_loss/print_every:.3f}.. "
f"Test loss: {test_loss/len(testloader):.3f}.. "
f"Test accuracy: {accuracy/len(testloader):.3f}")running_loss = 0
model.train()
The first statement just uses the variables made in the model to print out the current epoch, the train loss of that epoch, the test loss of that epoch and most importantly the test accuracy. running_loss = 0 sets the running_loss back to 0 so the model can train on the next batch. model.train() takes the model out of evaluation mode and back into testing mode.
You Finished!
There you have it! We just created a facial emotion recognition model line by line. You should be able to get an accuracy of above 60%.
If you’d like to check out the code on GitHub, click here. Thanks for reading, and have fun detecting those emotions! 🙂