What is the Confusion Matrix?

A brief overview of the Confusion matrix in machine learning is explained in the blog for the classification-based problems in ML.

The confusion matrix is the most persuasive tool for predictive analysis in machine learning. In order to check the performance of a classification-based ML model, the confusion matrix is hugely deployed.

It provides information about how a machine classifier has performed, matching suitably classified examples corresponding to misclassified examples.

Let’s discuss the concept of confusion matrix in detail.

A confusion matrix is a summarized table of the number of correct and incorrect predictions (or actual and predicted values) yielded by a classifier (or classification model) for binary classification tasks.

In simple words, “ A confusion matrix is a performance measurement for machine learning”.By visualizing the confusion matrix, an individual could determine the accuracy of the model by observing the diagonal values for measuring the number of accurate classification. (Read also: Machine Learning vs Deep Learning)

If considering the structure of the matrix, the size of the matrix is directly proportional to the number of output classes. The confusion matrix is in the form of a square matrix where the column represents the actual values and the row depicts the predicted value of the model and vice versa. Specifically;

A confusion matrix presents the ways in which a classification model becomes confused while making predictions.”
A good matrix (model) will have large values across the diagonal and small values off the diagonal.
Measuring a confusion matrix provides better insight in particulars of is our classification model is getting correct and what types of errors it is creating.

(Recommended blog: A Fuzzy-Logic Approach In Decision-Making)

For machine learning classification based problems, a confusion matrix is a performance measurement method. This is a table of four separate combinations of predicted and actual values. The table compares predicted values in Positive and Negative and actual values as True and False. These four elements are the fundamental building block of designing a confusion matrix.

2×2 Confusion Matrix

Now, let’s understand the classification concept in terms of True vs False and Positive vs Negative with some examples.

Case 1: A simple story of Boy and a wolf,

For having fun, a boy shouted out “Wolf”, even though there is no wolf, villagers ran out to save themselves but soon got angry when they realized the boy was playing a joke.

One day, the boy saw a wolf in reality and called out “Wolf is coming”, but villagers denied to be fooled again and stayed at home. And then, the hungry wolf demolished the village, destroyed their crops. After that, the entire village suffered many problems.

Making definitions:

“Wolf” is a positive class
“No wolf” is a negative class

Now, a wolf-prediction can be designed using 2×2 confusion matrix that could reflect all four possible conditions;

Classification as True vs False and Positive vs Negative

From the above discussion, we can say that;

A true positive is an outcome where the model correctly predicts the positive class,
A true negative is an outcome where the model correctly predicts the negative class.
A false positive is an outcome where the model incorrectly predicts the positive class when the actual class is negative, and,
A false negative is an outcome where the model incorrectly predicts the negative class when the actual class is positive. (Reference)

Case 2: An example of cricket

Making definition;

The batsman is NOT OUT, a positive class or logic 1.
The batsman is OUT, a negative class or logic 0.

Now in terms with the 2×2 confusion matrix;

True positive: An umpire gives a batsman NOT OUT when he is actually NOT OUT.
True Negative: When an umpire gives a batsman OUT when he is actually OUT.
False Positive (Type 1 error): This is the condition a batman is given NOT OUT when he is actually OUT.
False Negative (Type 2 error): When an umpire gives a batman OUT when he is actually NOT OUT.

It gives information about errors made by the classifier and the types of errors that are being made.
It reflects how a classification model is disorganized and confused while making predictions.
This feature assists in prevailing over the limitations of deploying classification accuracy alone.
It is practised in conditions where the classification problem is profoundly imbalanced and one class predominates over other classes.
The confusion matrix is hugely suitable for calculating Recall, Precision, Specificity, Accuracy and AUC-ROC Curve.

Precision

Precision explains how many correctly predicted values came out to be positive actually. Or simply it gives the number of correct outputs given by the model out of all the correctly predicted positive values by the model. It determines whether a model is reliable or not. It is useful for the conditions where false positive is a higher concern as compared to a false negative. For calculating the precision, the formula is;

Precision: TP/(TP+FP)

Recall

Recall describes how many of the actual positive values to be predicted correctly out of the model. It is useful when false-negative dominates false positives. The formula for calculating the recall is

Recall: TP/(TP+FN)

Increasing precision decreases recall and vice versa, this is known as the precision/recall tradeoff.

Accuracy

One of the significant parameters in determining the accuracy of the classification problems, it explains how regularly the model predicts the correct outputs and can be measured as the ratio of the number of correct predictions made by the classifier over the total number of predictions made by the classifiers. The formula is;

Accuracy: (TP+TN)/(TP+TN+FP+FN)

(Also read: Model Hyperparameter and Tuning in Machine Learning)

F-Measure

For the condition when two models have low precision and high recall or vice versa, it becomes hard to compare those models, therefore to solve this issue we can deploy F-score.

“F-score is a harmonic mean of Precision and Recall”.

By calculating F-score, we can evaluate the recall and precision at the same time. Also, if the recall is equal to precision, The F-score is maximum and can be calculated using the below formula:

F-measure= (2*Recall*precision)/ (Recall + Precision)

Besides the above discussed specific parameters, the following are other important terms in the confusion matrix that helps in determining the effectiveness of the classification model;

Null Error rate: For the conditions when the model always predicted the majority class, null error rate defines how frequently the model would be incorrect. According to the accuracy paradox, it is concluded that “an excellent classifier has a higher error rate than the null error rate.”
Receiver Operating Characteristic (ROC) Curve: It is a graph that reflects the performance of the classifier for all desirable thresholds. Also, a graph is plotted amid the true positive rate (on the Y-axis) and the false Positive rate (on the x-axis).

A ROC curve, Source

Area Under the Curve(AUC): It measures the distinctive potential of a binary classification model. If the value of AUC is high, more are the chances that an actual positive value will be specified a higher probability of being positive than actual negative value.
Misclassification rate: It explains how repeatedly the mode yields the wrong predictions, and also known as error rate. However, a value of error rate could be measured in the terms of the number of incorrect predictions over the total number of predictions made by the classifier. The formula of error rate is;

Error Rate: (FP+FN)/(TP+TN+FP+FN)

Cohen’s Kappa: In order to compute how perfectly the classifier worked in comparison to how correctly it would have performed simply unexpectedly. In other terms, a model will have a high Kappa score only if there would be a huge difference amidst the accuracy and the null error rate.

(Related blog: Random Forest Classifier in Machine Learning)

A confusion matrix is a remarkable approach for evaluating a classification model. It provides accurate insight into how correctly the model has classified the classes depending upon the data fed or how the classes are misclassified.

Talking about the measuring parameters, among precision, recall, accuracy and f-measure, it can be seen that precision and recall are immensely deployed parameters since their tradeoff relationship is a pragmatic measure for the achievement of prediction. Though the necessary model is presumed to have high precision and high recall, applicable in an ideally separable data.

Precision

Recall

Accuracy

F-Measure

Footer