*This article provides insights into the very basics of Deep Metric learning methods and how it helps in achieving state of the art results for tasks like face verification and face recognition. During my past internship, I have been working on these tasks, and seeing the power of deep metric learning methods for such tasks has influenced me in writing this article. Besides Face Recognition/Verification there are a number of other applications where Deep Metric Learning has proven to be quite effective; Anomaly Detection, three-dimensional(3D) modelling, are a few of them.*

**Prerequisites****A tour of the basics***. Face verification, Face Recognition*. ANNs (Training phase and Inference phase)**Image Classification for face verification?***. One-shot learning**Metric****Metric Learning***. Mahalanobis Distance Metric**Deep Metric Learning**Contrastive Loss — Siamese Networks

*.*. Triplet loss — Triplet Networks*. Softmax loss

*. A-Softmax loss

*. Large Margin Cosine Loss (LMCL)

*. Arcface loss**References**

The reader requires a good knowledge of linear algebra and familiarity with the basic concepts in machine learning to understand this article. I hope you will enjoy learning deep metric learning.

**Keywords: **Metric learning, Triplet loss, softmax loss, Face recognition, Face verification, DCNN, Arcface, Sphereface, Cosface.

Let us first understand a few basic terminologies and establish a solid ground to enhance our understanding of deep metric learning.

**Face verification **is the task of determining whether a given pair of images belong to the same person or not. In simple words, it is a task where given an image, we try to answer the question. **Is that you?. **A **1:1** authentication problem.

**Face Recognition, **on the other hand, is a combination of two tasks: **Face identification** and **Face verification**. Face identification is the task of recognizing a person from a database of images. A task where given an image database (let’s say a gallery of k persons) we try to answer the question, **who are you?. **So, Face recognition is a **1:k** authentication problem.

It is quite helpful to know the basic training and inference process of a simple **Artificial neural network**. {please skip this part if you are familiar with the topic}.

## TRAINING PHASE

- We
**randomly initialize**the weights and biases according to some probability distribution, - Feed the data set into the
**untrained**neural network architecture. **Forward propagation**: the hidden layer accepts the input data, applies an**activation**function, and passes the activations to the next successive layer. We propagate the results until we get the predicted result.- After generating the predictions, we calculate the
**loss**by comparing the**predicted**results to the ground truth labels. **Backward propagation:**We compute the**gradients**of the loss wrt to weights and biases, and adjust them by subtracting a small quantity proportional to the gradient.- We repeat steps 1 to 5 for the entire training set — this is one epoch. Repeat for more epochs until eventually our error is minimized or our prediction score is maximized.

## INFERENCE PHASE

In this phase, no calculation of gradients or adjustment of parameters takes place. It uses the same sets of weights and biases for the evaluation on an unknown test dataset which the network hasn’t seen before.

**Deep neural networks** are the go-to models for achieving the state of art performance when it comes to computer vision tasks. Learning **invariant **and** discriminative features **from data** **is the very fundamental goal for achieving good results for any computer vision task. Deep learning methods have been proven to be quite effective for **feature learning, **the very reason being the ability of deep learning methods** **to learn **hierarchical** feature representations by building high-level features from low-level ones.

Can we use Image Classification to solve the task of Image Verification? We might be able to learn a quite robust Deep Convolutional Neural network that performs excellently on classifying all the employee images in an organization and also takes into account factors like poses, expression, and illumination.

But, this is usually achieved when we have a good amount of data. By good amount I mean 1000’s of examples for each class/employee. In Image Classification, if the number of **data points per class** is small, it might lead to **overfitting** and yield very poor results. Also, Image Classification generally works well when the number of classes is small.

However, It is generally not the case with Person/Image verification tasks. In fact, it is quite opposite. Here, we usually have a very large number of classes and the number of examples per class is quite small. And this is where one-shot learning comes into the picture.

**One-shot Learning: **A classification problem that aims to learn about object categories from one/few training examples/images. [Wikipedia]. In simple words, given just one example/image of a person, you need to recognize him/her. To build a face recognition system, we need to solve this one-shot learning problem.

But deep neural nets usually require vast amounts of data to train on to excel at a particular task, which is not always available. Deep learning models won’t work well with just one training example, {one-shot learning problem}. How to address this issue? We learn a **similarity function, **which helps us to solve the one-shot learning problem.

Let us start to understand deep metric learning by understanding the basics.

A Metric is a non-negative function between two points x and y{say **g(x,y)**} that describes the so-called notion of **‘distance’** between these two points. There are several properties that a metric must satisfy:

**Non-negativity**=> d(x,y) ≥ 0 and d(x,y) = 0,**iff x = y.****Triangular inequality**=> d(x,y) ≤ d(x,z) + d(z,y).**Symmetry**=> g(x,y) = g(y,x).

EXAMPLES:

**A. The Euclidean Metric: **In a** ‘d’ **dimensional vector space, the metric is: