Model evaluation with metric API for regression and classification
In this article, we will discuss various metrics of regression and classification in machine learning. We always think of steps involved in modeling a good machine learning algorithm. The one-step is the metrics for evaluation of the goodness of the model. When we fit our model and make a prediction, then we always try to know the error and the accuracy. This article will try to deliver and explain various error measurement methods in regression and classification.
There are criteria to evaluate the prediction quality of the model as shown below:
- Metric functions: that we will study in this article.
- Estimator score method: this method has a score method to evaluate to solving the problem.
- Scoring parameter: The scoring parameters tells the estimator to choose the metric for evaluation of the model with
grid_search.GridSearchCV
andcross_validation.cross_val_score
Basic definition
- Estimator: It is a function or equation to predict the more accurate modeling points on real data points.
Tricks to know
There are two things to be noticed in the evaluation methods as shown below:
- First, some methods end with
score
word it means the value comes in this determined the ground truth. In this, if the number is on the higher side then it is better. - The second, if the word ends with
error
orloss
. In this, the number is lesser the better.
Metrics in Regression
The metrics for evaluate performance in regression are given below:
- Explained Variance Score: This metric evaluates the variation or dispersion of the data points.
- The formula of this metric is shown below:
Example in python
#import the variance score from the sklearn
from sklearn.metrics import explained_variance_scoretrue_values = [5, 2.5, 3, 6]
predicted_values = [4.5, 2.9, 3, 7]explained_variance_score(true_values, predicted_values)#output:
0.8525190839694656
2. Max Error: This metric will compute the worst value between the true values and predicted values.
- The formula for max error is shown below:
Example in python
#import the max error from the sklearn
from sklearn.metrics import max_errortrue_values = [5, 2.5, 3, 6]
predicted_values = [4.5, 2.9, 3, 8]max_error(true_values, predicted_values)#output:
2
3. Mean absolute error: This metric computes the mean error of the difference between true values and predicted values. This metric corresponds to the l1-norm loss.
- The formula for this metric is shown below:
Example in python
##import the MAE from the sklearn
from sklearn.metrics import mean_absolute_errortrue_values = [5, 2.5, 3, 6]
predicted_values = [4.5, 2.9, 3, 7]mean_absolute_error(true_values, predicted_values)#output:
0.475
3. Mean squared error: This metric computes the quadratic error or loss.
- The formula is shown below:
Example in python
#import the MSE from the sklearn
from sklearn.metrics import mean_squared_errortrue_values = [5, 2.5, 3, 6]
predicted_values = [4.5, 2.9, 3, 7]mean_squared_error(true_values, predicted_values)#output:
0.3525
4. R-squared Score: This metric computes the spread of the data from the mean or the estimator like a fitted regression line. It is generally called “coefficient of determination”.
- The formula of this metric is given below:
Example in python
#import the r-squared from the sklearn
from sklearn.metrics import r2_scoretrue_values = [5, 2.5, 3, 6]
predicted_values = [4.5, 2.9, 3, 7]r2_score(true_values, predicted_values)#output:
0.8277862595419847
Metrics in Classification
The metrics for evaluating performance in classification are given below:
- Accuracy score: This metric computes the accuracy of the true values are equal to the predicted values then it returns a fraction of the score otherwise if normalize parameter is FALSE then it will return the total number of true predicted values.
- The formula is given below:
Example in python
#import the accuracy score from the sklearn
from sklearn.metrics import accuracy_scoretrue_values = [5, 2, 3, 6]
predicted_values = [4, 3, 3, 6]accuracy_score(true_values, predicted_values)
2. Classification report: This metric computes a report that contains a precision, recall, and F1-score of the classification problem.
Example in Python
#import the classification report from the sklearn
from sklearn.metrics import classification_reporttrue_values = [3, 4, 3, 6]
predicted_values = [4, 3, 3, 6]
target_names = ['Apple', 'Orange', 'Kiwi']print(classification_report(true_values, predicted_values, target_names=target_names))
3. Hinge loss: This loss computes the average distance between the data points and model prediction points. It is also used in the SVM algorithm for maximum marginal.
Example in python
#importing the libraries
from sklearn import svm
from sklearn.metrics import hinge_loss
from sklearn.svm import LinearSVC#data set in x and y values
x_values = [[3], [2]]
y_values = [-1, 1]#using linear SVC model
svm_linear = svm.LinearSVC(random_state=0)#fitting the model
svm_linear.fit(x_values, y_values)
LinearSVC(random_state=0)#making decision prediction
pred_decision = svm_linear.decision_function([[-2], [3], [0.5]])
hinge_loss([-1, 1, 1], pred_decision)#output:
1.333372678152829
Conclusion:
These are some metrics from regression and classification to evaluate the model performance. There are various metrics base on regression, binary class, and multi-class metrics in classification.
I hope you like the article. Reach me on my LinkedIn
- NLP — Zero to Hero with Python
2. Python Data Structures Data-types and Objects
3. MySQL: Zero to Hero
4. Basics of Time Series with Python
5. NumPy: Zero to Hero with Python
6. Fundamentals of series and Data Frame in Pandas with python