Uncovering the Magic: interpreting Machine Learning black-box models

Have you ever developed a machine learning model with a great accuracy and an awesome AUC


Fabricio Pretto

3 years ago | 14 min read

Photo by fauxels from Pexels

The trade-off between predictive power and interpretability is a common issue to face when working with black-box models, especially in business environments where results have to be explained to non-technical audiences. Interpretability is crucial to being able to question, understand, and trust AI and ML systems.

It also provides data scientists and engineers better means for debugging models and ensuring that they are working as intended.


This tutorial aims to present different techniques for approaching model interpretation in black-box models.

Disclaimer: this article seeks to introduce some useful techniques from the field of interpretable machine learning to the average data scientist and to motivate its adoption . Most of them have been summarized from this highly recommendable book from Christoph Molnar: Interpretable Machine Learning.

The entire code used in this article can be found in my GitHub


  1. Taxonomy of Interpretability Methods
  2. Dataset and Model Training
  3. Global Importance
  4. Local Importance

1. Taxonomy of Interpretability Methods

  • Intrinsic or Post-Hoc? This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post-hoc).
  • Model-Specific or Model-Agnostic? Linear models have a model-specific interpretation, since the interpretation of the regression weights are specific to that sort of models. Similarly, decision trees splits have their own specific interpretation. Model-agnostic tools, on the other hand, can be used on any machine learning model and are applied after the model has been trained (post-hoc).
  • Local or Global? Local interpretability refers to explaining an individual prediction, whereas global interpretability is related to explaining the model general behavior in the prediction task. Both types of interpretations are important and there are different tools for addressing each of them.

2. Dataset and Model Training

The dataset used for this article is the Adult Census Income from UCI Machine Learning Repository. The prediction task is to determine whether a person makes over $50K a year.

Since the focus of this article is not centered in the modelling phase of the ML pipeline, minimum feature engineering was performed in order to model the data with an XGBoost.

The performance metrics obtained for the model are the following:

Fig. 1: Receiving Operating Characteristic (ROC) curves for Train and Test sets.

Fig. 2: XGBoost performance metrics

The model’s performance seems to be pretty acceptable.

3. Global Importance

The techniques used to evaluate the global behavior of the model will be:

3.1 - Feature Importance (evaluated by the XGBoost model and by SHAP)
3.2 - Summary Plot (SHAP)
3.3 - Permutation Importance (ELI5)
3.4 - Partial Dependence Plot (PDPBox and SHAP)
3.5 - Global Surrogate Model (Decision Tree and Logistic Regression)

3.1 - Feature Importance

  • XGBoost (model-specific)
feat_importances = pd.Series(clf_xgb_df.feature_importances_, index=X_train.columns).sort_values(ascending=True)

Fig. 3: XGBoost Feature Importance

When working with XGBoost, one must be careful when interpreting features importances, since the results might be misleading.

This is because the model calculates several importance metrics, with different interpretations. It creates an importance matrix, which is a table with the first column including the names of all the features actually used in the boosted trees, and the other with the resulting ‘importance’ values calculated with different metrics (Gain, Cover, Frequence). A more thourough explanation of these can be found here.

The Gain is the most relevant attribute to interpret the relative importance (i.e. improvement in accuracy) of each feature.

  • SHAP

In general, SHAP library is considered to be a model-agnostic tool for addressing interpretability (we will cover SHAP’s intuition in the Local Importance section). However, the library has a model-specific method for tree-based machine learning models such as decision trees, random forests and gradient boosted trees.

explainer = shap.TreeExplainer(clf_xgb_df)
shap_values = explainer.shap_values(X_test)shap.summary_plot(shap_values, X_test, plot_type = 'bar')

Fig. 4: SHAP Feature Importance

The XGBoost feature importance was used to evaluate the relevance of the predictors in the model’s outputs for the Train dataset and the SHAP one to evaluate it for Test dataset, in order to assess if the most important features were similar in both approaches and sets.

It is observed that the most important variables of the model are maintained, although in different order of importance (age seems to take much more relevance in the test set by SHAP approach).

3.2 Summary Plot (SHAP)

The SHAP Summary Plot is a very interesting plot to evaluate the features of the model, since it provides more information than the traditional Feature Importance:

  • Feature Importance: variables are sorted in descending order of importance.
  • Impact on Prediction: the position on the horizontal axis indicates whether the values of the dataset instances for each feature have more or less impact on the output of the model.
  • Original Value: the color indicates, for each feature, whether it is a high or low value (in the range of each of the feature).
  • Correlation: the correlation of a feature with the model output can be analyzed by evaluating its color (its range of values) and the impact on the horizontal axis. For example, it is observed that the age has a positive correlation with the target, since the impact on the output increases as the value of the feature increases.
shap.summary_plot(shap_values, X_test)

Fig. 5: SHAP Summary Plot

3.3 - Permutation Importance (ELI5)

Another way to assess the global importance of the predictors is to randomly permute the order of the instances for each feature in the dataset and predict with the trained model.

If by doing this disturbance in the order, the evaluation metric does not change substantially, then the feature is not so relevant. If instead the evaluation metric is affected, then the feature is considered important in the model. This process is done individually for each feature.

To evaluate the trained XGBoost model, the Area Under the Curve (AUC) of the ROC Curve will be used as the performance metric. Permutation Importance will be analyzed in both Train and Test:

# Train
perm = PermutationImportance(clf_xgb_df, scoring = 'roc_auc', random_state=1984).fit(X_train, y_train)
eli5.show_weights(perm, feature_names = X_train.columns.tolist())# Test
perm = PermutationImportance(clf_xgb_df, scoring = 'roc_auc', random_state=1984).fit(X_test, y_test)
eli5.show_weights(perm, feature_names = X_test.columns.tolist())

Fig. 6: Permutation Importance for Train and Test sets.

Even though the order of the most important features changes, it looks like that the most relevant ones remain the same. It is interesting to note that, unlike the XGBoost Feature Importance, the age variable in the Train set has a fairly strong effect (as showed by SHAP Feature Importance in the Test set).

Furthermore, the 6 most important variables according to the Permutation Importance are kept in Train and Test (the difference in order may be due to the distribution of each sample).

The coherence between the different approaches to approximate the global importance generates more confidence in the interpretation of the model’s output.

3.4 - Partial Dependence Plot (PDPBox and SHAP)

The Partial Dependence Plot (PDP) indicates the marginal effect that a feature has individually on the predicted output. For this, the feature is modified, ceteris paribus, and the changes in the mean prediction are observed. The process carried out is as follows:

1) Select feature
2) Define grid of values
3) For each value of the grid:
3.1) Replace feature with grid value
3.2) Average predictions
4) Plot curve

The PDP can indicate if the relationship between the feature and the output is linear, monotonic or if it is more complex. It is relevant to note that the observed relationship is with the prediction, not with the target variable. However, depending on the performance of the model, an intuition of the dependence of the target for the evaluated feature could be generated.

The advantage of PDP is that it is very easy to implement and it is quite intuitive: the function in a particular feature represents the average prediction if all data points are forced to assume each particular value.

We will analyze Partial Dependence Plots using PDPBox and SHAP.

  • PDPBox

As an example, the PDP for 2 of the most relevant observed features will be analyzed:

# Create the data that we will plot
pdp_education = pdp.pdp_isolate(model=clf_xgb_df, dataset=X_test, model_features=X_test.columns, feature='education.num')
pdp_age = pdp.pdp_isolate(model=clf_xgb_df, dataset=X_test, model_features=X_test.columns, feature='age')# Plot it
pdp.pdp_plot(pdp_education, 'education.num',figsize=(12, 6))
pdp.pdp_plot(pdp_age, 'age', figsize=(12, 6))

Fig. 7: Partial Dependence Plot for education.num

It looks like there is a linear relationship between the years of education (from 7 years onwards) and the probability of earning more than $50K. The impact of this feature in the model’s output proved to be as high as 0.6 (out of 1).

Fig. 8: Partial Dependence Plot for age

It seems that people are more likely to earn more than $50K in their 50’s.

  • SHAP Dependence Plot

The same PDPs will be generated using the SHAP approach. This library, in addition to indicating the marginal effect the feature has on the model’s output, also indicates by color the relationship with the feature which it most interacts with.

shap.dependence_plot('education.num', shap_values, X_test)
shap.dependence_plot('age', shap_values, X_test)

Fig. 9: Partial Dependence Plot for education.num

Even though the y-axis scale is different from the PDPBox plot (we will see why in the Local Interpretability section), the trend for “education.num” appears to be the same than in the previous approach. In addition, SHAP has identified that the feature “married_1” is the one with which it interacts most (this means that, for the model, married people with a high number of education years are more likely to earn more than $50K).

Fig. 10: Partial Dependence Plot for age

The trend for age in this method is consistent with the PDPBox approach. The feature with which it interacts the most is “education.num”.

Having stated the advantages PDP contributes to the interpretability field, it is worth it (and fair) to also present the disadvantages:

  • It does not take into consideration the distribution of the feature: it can be misleading, since it is possible to misinterpret regions with very little data (by allocating all the data points with these values, the value of the feature is being over-represented, which may lead to erroneous conclusions).
  • Assumption of independence of the features: it is one of the biggest drawbacks of PDP. It is assumed that the feature for which the partial dependence is computed is not correlated with the rest of the predictors.
  • Heterogeneous effects may be hidden: this is because only average marginal effects are computed. At the limit, the PDP could be a horizontal line, with the values evenly distributed above and below, concluding that the feature has no effect on the prediction.

To overcome some of the disadvantages of PDPs, Individual Conditional Expectation (ICE) and Accumulated Local Effects (ALE) plots can be used. Even though the implementations of these methods are not covered in this article, we will briefly explain them to show how they improve the PDP approach.

Individual Conditional Expectation (ICE)

ICE Plot is the PDP equivalent for individual data points. The plot displays a line for each instance of the dataset, indicating how the prediction of that instance varies as the value of the feature varies. A PDP is an average of all the lines in an ICE plot. The ICE plots allow to visualize the variance in the marginal effects, being able to detect the heterogeneous effects.

Accumulated Local Effects (ALE) Plot

PDPs present serious problems when a feature is highly correlated with other predictors, since synthetic instance predictions that are very unlikely to happen in reality are averaged (for example, it would be very unlikely that age were 16 and education.num were 10 simultaneously). This can generate a significant bias when estimating the effect of the feature. ALE plots, in addition to being computed more quickly, are an unbiased solution to calculate the effect of a feature on model predictions, since they evaluate over its conditional distribution. This is, for a value x1 of the grid, they estimate using only the predictions of the instances that have a value similar to x1, thus avoiding the use of improbable instances in reality.

Furthermore, in order to estimate the effect of a feature on the prediction, instead of using the average (which mixes the effect of the feature with the effects of all the correlated predictors), they calculate the differences between predictions.

Differences between ICE and ALE

The ICE plot solves the problem of heterogeneous effects that PDPs can present, but not the bias due to correlated features. Instead, ALE plot solves bias problems, taking into consideration the conditional distribution of the feature and its correlation with the rest of the predictors.

3.5 - Global Surrogate Model

A global surrogate model is an interpretable model that is trained to approximate the predictions of a black-box model. We can draw conclusions about the black box model by interpreting the surrogate model. In Christoph Molnar’s words: “Solving machine learning interpretability by using more machine learning!”

We will try to approximate the XGBoost using a Logistic Regression and a Decision Tree as global surrogate models.

  • Logistic Regression
# Train
log_clf = LogisticRegression().fit(X_train, y_train)# Predictions
y_pred_train_log = log_clf.predict(X_train)
y_proba_train_log = log_clf.predict_proba(X_train)[:, 1]
y_pred_test_log = log_clf.predict(X_test)
y_proba_test_log = log_clf.predict_proba(X_test)[:, 1]# R-squared
print('R-squared Train RL-XGB: ', r2_score(y_proba_train_log, y_proba_train))
print('R-squared Test RL-XGB: ', r2_score(y_proba_test_log, y_proba_test))

Fig. 11: R-squared between Logistic Regression and XGBoost predictions.

The R-squared is negative for both Train and Test sets. This happens when the fit is worse than simply using the mean. Therefore, it is concluded that Logistic Regression is not a good surrogate model.

  • Decision Tree
# Train
tree_clf = tree.DecisionTreeClassifier(random_state=0, max_depth=4).fit(X_train, y_train)# Predictions
y_pred_train_tr = tree_clf.predict(X_train)
y_proba_train_tr = tree_clf.predict_proba(X_train)[:, 1]
y_pred_test_tr = tree_clf.predict(X_test)
y_proba_test_tr = tree_clf.predict_proba(X_test)[:, 1]# R-squared
print('R-squared Train DT-XGB: ', r2_score(y_proba_train_tr, y_proba_train))
print('R-squared Test DT-XGB: ', r2_score(y_proba_test_tr, y_proba_test))# Metrics
clf_metrics(y_pred_train_tr, y_proba_train_tr, y_train, y_pred_test_tr, y_proba_test_tr, y_test)

Fig. 12: R-squared between Decision Tree and XGBoost predictions, and performance metrics for the former.

The variance in the XGBoost model predictions is fairly well approximated by the Decision Tree, so it can serve as a surrogate model for interpreting the main model. In fact, the performance metrics are also quite close to the original model.

It is important to note that while the variance of the XGBoost predictions is well explained by the Decision Tree, it is not guaranteed that the latter uses the features in the same way as the former. It could happen that the Tree approximates the XGBoost correctly in some areas of the input space, but behaves drastically differently in other regions.

The resulting tree will be analyzed in order to assess whether the features used correspond to the most important features that have been detected so far:

# Plot tree
fig, ax = plt.subplots(figsize=(30, 10))
tree.plot_tree(tree_clf, feature_names= X_train.columns.to_list(), ax=ax, filled=True)

Fig. 13: Trained Decision Tree

The 5 features that the tree used to estimate the Income, in order of importance, are:

1. married_1
2. education.num
3. capital.gain
4. capital.loss
5. age

These features correspond to the most important ones that have been detected by the other methodologies.

4. Local Importance

Local surrogate models are interpretable models that are used to explain individual predictions of black-box machine learning models.

4.1 - Local Interpretable Model-agnostic Explanations (LIME)

LIME analyzes what happens in model predictions when variations are made to the input data. It generates a new dataset with permuted samples and their corresponding predictions from the original model. On this synthetic set LIME trains interpretable models (Logistic Regression, Decision Tree, LASSO Regression, etc.), which are then weighted by the proximity of the sampled instances to the instance of interest.

The explanation for instance X will be that of the surrogate model that minimizes the loss function (performance measure -e.g. MSE- between the prediction of the surrogate model and the prediction of the original model), keeping the complexity of the model low.

# Generate explainer
explainer = lime.lime_tabular.LimeTabularExplainer(X_train.values, mode='classification',feature_names=X_train.columns.tolist(), discretize_continuous=False, random_state=1984)# Generate explanation for instance i
i = 546
exp = explainer.explain_instance(X_test.values[i], clf_xgb_array.predict_proba)# Plot
fig = exp.as_pyplot_figure();

Fig. 14: Relative importance of each feature in individual prediction.

# Generate explanation for sample
sp_obj = submodular_pick.SubmodularPick(explainer, X_test.values, clf_xgb_array.predict_proba, sample_size=3, num_exps_desired=3)[exp.show_in_notebook() for exp in sp_obj.sp_explanations]

Fig. 15: Individual explanation for sampled instances.

It is observed that the most influential feature in all individual interpretations that separate the classes is capital.gain. Following this, depending on the instance, the predictors of greatest relevance are married, education.num, age and sex. These features are the same that were identified in the algorithms of global importance.

4.2 - SHapley Additive exPlanations (SHAP)

SHAP is a method to explain individual predictions based on the calculation of Shapley Values, a method from coalitional game theory. It seeks to answer the question “How much has each feature value contributed to the prediction, relative to the average prediction?”. To do this, the Shapley Values assign “payments” to “players” depending on their contribution to the “total payment”. Players cooperate in a coalition and receive certain rewards for such cooperation.

In the machine learning context, the “game” is the prediction task for an instance of the dataset. The “total payment” is the prediction for that instance, minus the average prediction for the entire dataset.

The “players” are the values of the features for the instance, which cooperate in a coalition to receive the “payment” (the prediction). The Shapley Value is the average marginal contribution of a feature value for all possible coalitions. It indicates how the “total payout” (prediction) is distributed among all “players” (the feature values).

One innovation that SHAP brings to the table is that the Shapley value explanation is represented as an additive feature attribution method, namely a linear model. In this way, SHAP connects the benefits of LIME with the Shapley Values.

# Create explainer
explainer = shap.TreeExplainer(clf_xgb_df, model_output='probability', feature_dependence='independent', data=X_test)# Generate explanation for instance i
i= 150
data_for_prediction = X_test.iloc[i, :]shap_values = explainer.shap_values(data_for_prediction)

There are several methods for visualizaing SHAP’s explanations. We will cover two of them in this article: Force Plot and Decision Plot.

Force Plot

shap.force_plot(explainer.expected_value, shap_values, data_for_prediction)

Fig. 16: SHAP Force Plot explanation for a single instance

The force plot indicates, for each feature, the impact it had on the prediction. There are two relevant values to notice: the output value (model prediction for the instance) and the base value (average prediction for the entire dataset). A bigger bar means a higher impact and the color indicates if the feature value moved the prediction from the base value towards 1 (red) or 0 (blue).

Decision Plot

shap.decision_plot(explainer.expected_value, shap_values, data_for_prediction)

Fig. 17: SHAP Decision Plot explanation for a single instance

The Decision Plot shows essentially the same information than the Force Plot. The grey vertical line is the base value and the red line indicates if each feature moved the output value to a higher or lower value than the average prediction.

This plot can be a little bit more clear and intuitive than the previous one, especially when there are many features to analyze. In the Force Plot the information may look very condensed when the number of predictors is high.


This article is meant to help data scientists get a better understanding of how their machine learning models work and to be able to explain the results in a clear way. It is also useful for debugging models and ensuring that they are working as intended.

We have presented different classifications of interpretability methods (intrinsic/post-hoc, model-specific/model-agnostic, local/global) and we used several libraries and techniques for assesing both global and local importance.

In summary, the libraries and techniques used are:

  • XGBoost: Feature Importance
  • ELI5: Permutation Importance
  • PDPBox: Partial Dependence Plot
  • Global Surrogate Model: Logistic Regression, Decision Tree
  • LIME: Local Importance
  • SHAP: Feature Importance, Summary Plot, Partial Dependence Plot, Local Importance

So, which is the single best library to address ML model interpretability? In my opinion, the use of several libraries and techniques helps to build credibility on the model’s output (provided than the results are consistent). However, if I had to choose one, I would definitely go for SHAP.

SHAP had a great contribution to the field of interpretable Machine Learning. This is so because here the global interpretations are consistent with the individual explanations, since the Shapley Values are the “atomic unit” of the global interpretations (which have a solid theoretical foundation in Game Theory). If, for example, LIME were used for local explanations and PDP or Permutation Importance for global interpretations, there is no common theoretical foundation between the methods.

I hope this article serves its purpose as a general guide into cracking black-box models. The entire code can be found in my GitHub

In my next article I will be addressing model fairness, which has been gaining increasing awareness over the past years. This field aims to assess how fair the model is when treating pre-existing biases in data: is it fair that a job-matching system favors male candidates for CEO interviews, because that matches historical data?

Stay tuned!


Created by

Fabricio Pretto







Related Articles