Machine learning models can often be hard to understand and explain, in terms of how the model works to produce its predictions. Often we need to show how each of the features impact on the predictions. This is not just a theoretical problem - there may be laws in the future that will compel us to be able to explain how and why models arrived at their values.

A model fingerprint is an explanation of a model that describes what effect each of the model's features have on the target. This allows you to see inside the model at each feature, and get an explanation on that feature and how it impacts the model. The feature effects are broken down into three types

linear a change in the feature results in a constant change in the target
non-linear a change in the feature does not result in a constant change in the target
interaction These result from more than one features combining to produce an impact on the target, different from the effect that each feature has in isolation.

Model Fingerprint

The implementation of model fingerprints is from the mlfinlab library. mlfinlab depends on scipy, numba and a few other scientific python libraries that are tricky to install locally, but is installable with no problem on Google Colab. If you want to run the notebook, then uncomment below and run to install mlfinlab.

# !pip install mlfinlab

Load Data from sklearn.datasets

To demonstrate, first we load some data from sklearn.datasets

from sklearn.datasets import load_boston
import pandas as pd
data = load_boston()
X = pd.DataFrame(data['data'], columns=data['feature_names'])
y = pd.Series(data['target'])

Create a Model

The model we will use is a GradientBoostingRegressor

from sklearn.ensemble import GradientBoostingRegressor

clf = GradientBoostingRegressor(n_estimators=60, random_state=77)
clf.fit(X,y)

GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=60,
                          n_iter_no_change=None, presort='auto',
                          random_state=77, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

Get the Model Fingerprint

The class that implements the model fingerprint is the RegressionModelFingerprint. This we fit on the classifier we trained earlier. We also pass as parameters the tuples that we which to see the interaction effects for.

from mlfinlab.feature_importance import RegressionModelFingerprint

reg_fingerprint = RegressionModelFingerprint()
reg_fingerprint.fit(clf, X, num_values=20, 
                    pairwise_combinations=[('CRIM', 'ZN'), ('RM', 'AGE'), ('LSTAT', 'DIS')])
linear_effect, non_linear_effect, pairwise_effect = reg_fingerprint.get_effects()

Linear Effects

pd.DataFrame({'feature': list(linear_effect['norm'].keys()),
              'effect': list(linear_effect['norm'].values())})\
.set_index('feature').sort_values('effect', ascending=False)

Non-Linear Effects

pd.DataFrame({'feature': list(non_linear_effect['norm'].keys()),
              'effect': list(non_linear_effect['norm'].values())})\
.set_index('feature').sort_values('effect', ascending=False)

Plot Feature Effects

To visualize the model fingerprint we call plot_effects. This produces a chart with three panes, one for each of the effects we are interested in.

import matplotlib.style as style
style.use('seaborn')

fig = reg_fingerprint.plot_effects()
fig.set_size_inches((14,5))

Store the Fingerprint and the Model

One really useful thing you can do with scikitlean models is to store additional properties on them using the python dot syntax. This allows you to set information on models that would be useful after saving and reloading the model. In our case we would like to save the fingerprint and the feature names.

clf.fingerprint = reg_fingerprint
clf.feature_names = X.columns.to_list()

	effect
feature
LSTAT	0.332337
RM	0.228615
DIS	0.107575
PTRATIO	0.085696
NOX	0.059710
AGE	0.056358
TAX	0.043767
B	0.036869
CRIM	0.024518
CHAS	0.009183
RAD	0.006958
INDUS	0.005403
ZN	0.003011

	effect
feature
RM	2.832436e-01
LSTAT	2.364719e-01
DIS	2.050034e-01
NOX	5.881008e-02
TAX	5.800090e-02
PTRATIO	5.147495e-02
CRIM	4.230346e-02
AGE	2.961393e-02
RAD	1.434398e-02
INDUS	1.024114e-02
B	7.571608e-03
ZN	2.921031e-03
CHAS	5.831799e-16