Model Explainability¶
In this article we look at model explainability using model fingerprints
Machine learning models can often be hard to understand and explain, in terms of how the model works to produce its predictions. Often we need to show how each of the features impact on the predictions. This is not just a theoretical problem - there may be laws in the future that will compel us to be able to explain how and why models arrived at their values.
A model fingerprint is an explanation of a model that describes what effect each of the model's features have on the target. This allows you to see inside the model at each feature, and get an explanation on that feature and how it impacts the model. The feature effects are broken down into three types
- linear a change in the feature results in a constant change in the target
- non-linear a change in the feature does not result in a constant change in the target
- interaction These result from more than one features combining to produce an impact on the target, different from the effect that each feature has in isolation.
Model Fingerprint
The implementation of model fingerprints is from the mlfinlab library. mlfinlab depends on scipy, numba and a few other scientific python libraries that are tricky to install locally, but is installable with no problem on Google Colab. If you want to run the notebook, then uncomment below and run to install mlfinlab.
# !pip install mlfinlab
from sklearn.datasets import load_boston
import pandas as pd
data = load_boston()
X = pd.DataFrame(data['data'], columns=data['feature_names'])
y = pd.Series(data['target'])
from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(n_estimators=60, random_state=77)
clf.fit(X,y)
from mlfinlab.feature_importance import RegressionModelFingerprint
reg_fingerprint = RegressionModelFingerprint()
reg_fingerprint.fit(clf, X, num_values=20,
pairwise_combinations=[('CRIM', 'ZN'), ('RM', 'AGE'), ('LSTAT', 'DIS')])
linear_effect, non_linear_effect, pairwise_effect = reg_fingerprint.get_effects()
pd.DataFrame({'feature': list(linear_effect['norm'].keys()),
'effect': list(linear_effect['norm'].values())})\
.set_index('feature').sort_values('effect', ascending=False)
pd.DataFrame({'feature': list(non_linear_effect['norm'].keys()),
'effect': list(non_linear_effect['norm'].values())})\
.set_index('feature').sort_values('effect', ascending=False)
To visualize the model fingerprint we call plot_effects
. This produces a chart with three panes, one for each of the effects we are interested in.
import matplotlib.style as style
style.use('seaborn')
fig = reg_fingerprint.plot_effects()
fig.set_size_inches((14,5))
Store the Fingerprint and the Model
One really useful thing you can do with scikitlean models is to store additional properties on them using the python dot syntax. This allows you to set information on models that would be useful after saving and reloading the model. In our case we would like to save the fingerprint and the feature names.
clf.fingerprint = reg_fingerprint
clf.feature_names = X.columns.to_list()