Novetta applies machine learning models to help customers make better decisions across a wide range of use cases. Increasingly, decision makers want to understand why a model makes the decisions that it does. Model explanations help our customers gain an understanding of how our models function, while enabling us to debug our models and assess them for bias and fairness. In this blog post, we describe the tools and techniques we use to generate explanations from the predictions of our machine learning models.
Interpretable Models
Selecting an inherently interpretable “white-box” model is an obvious way to gain insights on how features contribute to a prediction. A linear model is the most easily interpretable model, describing feature importance through its coefficients, but this simplicity may come at the cost of performance by failing to capture non-linear relationships between features. Decision tree-based models, especially gradient boosting decision trees such as XGBoost and LightGBM, have become popular white-box model choices that characterize feature importance while performing well for many structured data tasks.
For data scientists who need to dig further into white-box model behavior, InterpretML [1] is a framework from Microsoft that offers explanations, as shown below, for the following:
- global phenomena, the model’s general behavior when using features to make predictions across all inputs
- local phenomena, the model’s behavior for a specific prediction
Explanations might come in the form of global and local feature importance rankings or relationships between a feature and the target label through partial dependence plots. InterpretML also includes an implementation of Explainable Boosting Machine (EBM), a type of Generalized Additive Model (GAM) that has shown comparable performance to XGBoost and LightGBM for certain tasks.
IntepretML

Left: A global explanation of how the model changes the income predictions based on age. Right: A local explanation of how each feature (notably capital gain) contributed to a particular prediction. [1]
- TabNet [2] is an interpretable deep learning architecture from Google AI that ingests tabular data and leverages the layers that comprise its sequential attention mechanism, providing both decision-step and aggregate-feature importance masks.
- In computer vision, convolutional neural network explainability can be represented by generating class activation mappings (CAM) [3] from deep layers of the network, visualizing which regions in an image are used to make a prediction.
Lastly, in language modelling, explainability tools such as exBERT [4] provide plots visualizing the attention given to each token in a sentence to better understand how transformer models form representations and model a language.
TabNet

Feature importance masks from a TabNet model trained for income classification showing the importance of each feature for 20 inputs [5]
Surrogate Modeling
“Black-box” models such as DNNs do not provide direct visibility into how predictions are made, and are therefore less explainable. “Surrogate modeling” leverages white-box interpretability to explain a black-box model’s outputs. Global surrogate modeling trains a white-box model on the predictions of a black-box model (rather than ground truth) to approximate the black-box model’s general behavior. The interpretable global surrogate model can offer explanations such as those discussed above, such as feature importance.
Global surrogate modeling can also be used at the local level by training local surrogate models to model the behavior of a black-box model for a particular input. Local Interpretable Model-Agnostic Explanations (LIME) [6] implements this approach, perturbing the input of interest by minimally augmenting its feature values and obtaining the predicted label of the perturbed copy from the black-box model. The copies and predictions are used to train a linear local surrogate model whose coefficients create explanations, often as plots that show how features contributed to the prediction.
LIME

Left to right: model’s prediction, contributions of each feature towards the prediction, and the value of each feature. [6]
Counterfactual Instances
Counterfactual instances enable data scientists to explore the relationship between model parameters (or inputs) and prediction. These explanations are most commonly used in adversarial attacks against machine learning models, but they can be applied to the decision-making process of a black-box model.
What-If Tool

Counterfactual (right) of a person that the model predicted to make a salary of less than $50K (left) [8]
The What-If Tool [9] from Google can also find counterfactuals for an instance using an alternative approach. It detects counterfactuals within the dataset used for training the model using L1 or L2 distance rather than generating synthetic ones. Besides finding counterfactuals, the What-If Tool also allows users to easily detect misclassifications by visualizing inference results. It also optimizes models for different fairness metrics without the user having to write additional code.
Conclusion
Interpretable models, surrogate modeling, and counterfactual instances are a few examples of how we gain insight into machine learning model predictions with human interpretable explanations. These insights help us reduce bias, improve prediction accuracy, and increase decision-makers’ and customers’ understanding of the outputs they rely on.
References
[1] Nori, H., Jenkins, S., Koch, P.L., & Caruana, R. (2019). InterpretML: A Unified Framework for Machine Learning Interpretability. ArXiv, abs/1909.09223.
[2] Arik, S., & Pfister, T. (2019). TabNet: Attentive Interpretable Tabular Learning. ArXiv, abs/1908.07442.
[3] Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2921-2929
[4] Hoover, B., Strobelt, H., & Gehrmann, S. (2019). exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models. ArXiv, abs/1910.05276.
[5] Mueller, Z. (2020, April 14). 04_TabNet. Retrieved 2020, from https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Tabular%20Notebooks/04_TabNet.ipynb
[6] Ribeiro, M.T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[7] Klaise, J., Van Looveren, A., Vacanti, G., & Coca, A. (2020). Alibi: Algorithms for monitoring and explaining machine learning models (0.4.0) [Computer software]. Seldon. https://github.com/SeldonIO/alibi
[8] Mothilal, R.K., Sharma, A., & Tan, C. (2020). Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
[9] Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., & Wilson, J. (2019). The What-If Tool: Interactive Probing of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics, 26, 56-65.