This article is a part of a series that I’m writing, and where I will try to address the topic of using Deep Learning in NLP. First of all, I was writing an article for an example of text classification using a perceptron, but I was thinking that will be better to review some basics before, as activation and loss functions.

Loss function also called the objective function, is one of the main bricks in supervised machine learning algorithm which is based on labeled data. A loss function guides the training algorithm to update parameters in the right way. In a much simple definition, a loss function takes a truth **(y)** and a prediction **(ŷ)** as input and gives a score of real value number. This value indicates how much the prediction is close to the truth. The higher this value is, the worse the model’s prediction is, and vice versa.

In this article, I present three of the mose used loss functions.

Mean Squared Error loss function, known as MSE, is most used in regression problems having continuous target (y) and prediction (ŷ) values. MSE is the average of the squares of the difference between the target and the predicted values. There are other alternatives for MSE, as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE), but all those functions are based on computing the real-valued distance between the targets and the predictions (Output).

Mathematical formula of MSE :