SHAP Values for Model Interpretation

Explaining a model using concepts from game theory

In an increasing number of domains, machine learning models have started being held to a higher standard. Model predictions are no longer enough. Companies can now be held responsible for any spurious predictions their models produce. With this shift, model explainability has arguably taken priority over predictive power. Metrics such as accuracy and R2 scores have taken the back seat while being able to explain model predictions has gained more and more importance. We’ve looked at several ways to explain your models and gain a better understanding of how they work. Here we’ll look at SHAP values, a powerful way to explain predictions that come from a machine learning model.

SHAP — meaning SHapley Additive exPlanations is a method for explaining individual predictions from a machine learning model. This goes beyond the commonly used method of relying on coefficients for model interpretation that was discussed in another article.

How do they work?

SHAP is based on Shapley values, a concept from game theory developed by economist Lloyd Shapley. The method helps us explain a model by allowing us to see how much each feature contributes to the model’s prediction. Each feature in our model will represent a “player”, while the “game” would be the prediction of the model. In effect, we will be trying to see how much each player contributes to the game.

The process for doing this involves calculating the prediction of the model with the feature and without each feature. By getting the difference between these two predictions, we can see how much that feature contributes to the model’s prediction. This is the feature’s marginal contribution. We do this for every subset of features and take the average of these contributions to get the feature’s Shapley value.

Computing for marginal distributions

A visual representation for each subset of features

For our example, let’s say we have a model that predicts the price of a house. The image above shows this in the form of a graph. We will have three features: Rooms, Age, and Location. In total, we will have 8 different subsets of features. Each node in the graph will represent a separate model so we will also have 8 different models. We will train each model on its corresponding subset and predict the same row of data.

Predictions for each subset of features

Each node in our graph is connected to another node by a directed edge. Node 1 will have no features, meaning it will just predict the average value seen in our training data (100k $). Following the blue-colored edge that goes to node 2, we see that the model with a single feature Rooms predicts a lower value at 85k $. This means that the marginal contribution of Rooms to the model that has Rooms as the only feature is -15k $ (85k $ – 100k $). We have done this for one model but there are several models where Rooms is a feature. We will do this calculation for each model where the Rooms feature is added.

Calculating marginal contributions

The image above highlights each edge where the Rooms feature is added and also shows the marginal contribution of that feature in each model. The next thing we have to do is take the average of these marginal contributions. The only problem is how we will weigh each of them in our average. You would think that we could weigh each of them equally but this is not the case. Models with fewer features would mean that the marginal contributions of each feature would be greater. Therefore, models with the same number of features should have the same weight.

Grouping nodes by number of features

We can group our graph into rows as seen above. Each row would have models with a different number of features in them. In averaging our marginal contributions, we want each row to have equal weight. Since we have 3 rows, each row will have a weight of 1/3. For our row with 2 feature models, we have two models with the feature Rooms so each of those models should have a weight of 1/6. Our breakdown for the weight of each ‘type’ of model will be as follows:

1 feature models: 1/3
2 feature models: 1/6
3 feature models: 1/3

And our final calculation would look something like this:

Our SHAP value for the Rooms feature would therefore be -10.5k $. We could then repeat this process for each feature in our model to find values for all of them. What is good about this particular method is that we can see how features affect individual predictions rather than just an average effect over all examples in our dataset.

Implementation

Looks like a pain to implement all of this from scratch right? Thankfully, as with most data science tasks in Python, there is a library we can use to do this called shap. In line with our illustrated example above, we’ll use a real estate dataset from Kaggle. I’ll go over the code to be able to this below.

Train a model and get SHAP values for a single row of data

SHAP value plot for a single row of data

The plot above shows us several important pieces of information. The value for this house is predicted as 15.42. To the right of the plot, we see the base value of 23.16. We also see two distinct sets of features, colored red and blue respectively. The features highlighted in red contributed to the prediction being higher while the features in blue contributed to it being lower. The size that each feature takes up in the plot shows how much it affected the prediction. We see that the feature LSTAT (% lower status of the population) contributed most to lowering the value of the prediction while the feature CRIM (per capita crime rate by town) contributed most to increasing its value.

Another thing you may have noticed is that I used a class called TreeExplainer. This is because we used a tree-based model (Random Forest) in this example. There are several ‘explainers’ in the shap library. A more general explainer that would work for non-tree-based models is the KernelExplainer. Alternatively, you could also use the DeepExplainer for deep learning models.

Another way we can look at this individual prediction is through the decision plot above. We see a solid black vertical line at the base value of 23.16. Starting our way from the bottom and moving up the plot, we see how each feature we encounter affects the prediction of our model until we reach the top which is our final prediction for this particular row of data. There are many other ways to visualize SHAP values from a model; these examples are just to get you started.

Conclusion

We have looked at SHAP values, a way to explain the predictions from a machine learning model. Through this method, we can look at individual predictions and see how each feature affects the outcome. We stepped through an example calculation of SHAP values by looking at a model that determines the price of a house. We also looked at the shap library in Python to be able to quickly compute and visualize SHAP values. In particular, we covered the force_plot and the decision_plot for visualizing SHAP values.

Sources

Explaining a model using concepts from game theory

Footer