Some Python magic: Measures of Fit in action
I hear you think: “Micha, this is interesting, but how can I use this in Python?”. Let me show you!
Let’s start with stating some true values and some predictions. I will be using some easy to see and calculate deviations. This is a nice way to get a good feeling for the discussed measures of fit, so next time you work with a model, you can use the measures yourself in a larger scenario.
y_true = [-10, -5, 0, 5, 10, 15]
y_always_five = [-15, -10, -5, 0, 5, 10]
y_small_large = [-7, -2, 3, 12, 17, 22]
y_outlier = [-10, -5, 0, 5, 10, 45]
Now that we have the y_true and the predictions, we can calculate if our prediction is any good. We’ll go over them one by one. The python code will only show examples for y_always_five, Table 1 will show the the measures for other predictions as well.
print("MAE y_always_five: ", mean_absolute_error(y_true, y_always_five))MAE y_always five: 5.0print("MSE y_always_five: ", mean_squared_error(y_true, y_always_five))MSE y_always_five: 25.0print("RMSE y_always_five: ", sqrt(mean_squared_error(y_true, y_always_five)))RMSE y_always_five: 5.0print("R2 y_always_five: ", r2_score(y_true, y_always_five))R2 y_always_five: 0.6571428571428571
A bunch of different numbers. Time to interpret them and get our best fit. The table below displays the different predictions and their corresponding measures (Table 1). But first some code to plot these predictions. The result is shown in Figure 1.
As previously stated, MAE doesn’t handle outliers very well. The y_outlier has a large outlier on x=5, but still shows a MAE value of 5. The other 3 measures are better at displaying this outlier. MSE and RMSE are giving a much higher value than before and R² even gives a negative number, showing that the predicted model is worse than a horizontal line at predicting the values. You can also notice that MSE has a much higher value than the other measures. As earlier discussed, this has to do with the MSE having an order of two and the data having an order of one. You can also notice the difference between y_always_five and y_small_large when you look at RMSE. There is a small difference in error. This shows that the larger errors in y_small_large way heavier than the smaller errors.