Novetta applies machine learning models to help customers make better decisions across a wide range of use cases. Increasingly, decision makers want to understand why a model makes the decisions that it does. Model explanations help our customers gain an understanding of how our models function, while enabling us to debug our models and assess them for bias and fairness. In this blog post, we describe the tools and techniques we use to generate explanations from the predictions of our machine learning models.
Selecting an inherently interpretable “white-box” model is an obvious way to gain insights on how features contribute to a prediction. A linear model is the most easily interpretable model, describing feature importance through its coefficients, but this simplicity may come at the cost of performance by failing to capture non-linear relationships between features. Decision tree-based models, especially gradient boosting decision trees such as XGBoost and LightGBM, have become popular white-box model choices that characterize feature importance while performing well for many structured data tasks.
For data scientists who need to dig further into white-box model behavior, InterpretML  is a framework from Microsoft that offers explanations, as shown below, for the following:
- global phenomena, the model’s general behavior when using features to make predictions across all inputs
- local phenomena, the model’s behavior for a specific prediction
Explanations might come in the form of global and local feature importance rankings or relationships between a feature and the target label through partial dependence plots. InterpretML also includes an implementation of Explainable Boosting Machine (EBM), a type of Generalized Additive Model (GAM) that has shown comparable performance to XGBoost and LightGBM for certain tasks.
IntepretMLDeep neural networks (DNN) are a popular choice for difficult machine learning tasks, but their complexity makes it difficult to extrapolate why a model made a certain prediction. Practitioners are making DNNs interpretable by leveraging the internals of the network.
- TabNet  is an interpretable deep learning architecture from Google AI that ingests tabular data and leverages the layers that comprise its sequential attention mechanism, providing both decision-step and aggregate-feature importance masks.
- In computer vision, convolutional neural network explainability can be represented by generating class activation mappings (CAM)  from deep layers of the network, visualizing which regions in an image are used to make a prediction.
Lastly, in language modelling, explainability tools such as exBERT  provide plots visualizing the attention given to each token in a sentence to better understand how transformer models form representations and model a language.
TabNetUsing interpretable models could be particularly useful in the case of decisions such as loan approval or job candidate filtering. A data scientist might identify necessary optimizations within a white-box model, given insight into dependence on sex or race to determine the classification.
“Black-box” models such as DNNs do not provide direct visibility into how predictions are made, and are therefore less explainable. “Surrogate modeling” leverages white-box interpretability to explain a black-box model’s outputs. Global surrogate modeling trains a white-box model on the predictions of a black-box model (rather than ground truth) to approximate the black-box model’s general behavior. The interpretable global surrogate model can offer explanations such as those discussed above, such as feature importance.
Global surrogate modeling can also be used at the local level by training local surrogate models to model the behavior of a black-box model for a particular input. Local Interpretable Model-Agnostic Explanations (LIME)  implements this approach, perturbing the input of interest by minimally augmenting its feature values and obtaining the predicted label of the perturbed copy from the black-box model. The copies and predictions are used to train a linear local surrogate model whose coefficients create explanations, often as plots that show how features contributed to the prediction.
Counterfactual instances enable data scientists to explore the relationship between model parameters (or inputs) and prediction. These explanations are most commonly used in adversarial attacks against machine learning models, but they can be applied to the decision-making process of a black-box model.
What-If ToolOptimal counterfactual approaches will (1) quickly generate counterfactual instances, (2) minimally perturb the instance, (3) show coherent relationships between feature values and input space, and (4) be model-agnostic. In our experience, tools that meet these criteria include Alibi from Seldon and Diverse Counterfactuals Explanations (DiCE) from Microsoft. Alibi  and DiCE  generate high-quality synthetic counterfactual instances that allow the user to control which features can be changed and their range of possible values.
The What-If Tool  from Google can also find counterfactuals for an instance using an alternative approach. It detects counterfactuals within the dataset used for training the model using L1 or L2 distance rather than generating synthetic ones. Besides finding counterfactuals, the What-If Tool also allows users to easily detect misclassifications by visualizing inference results. It also optimizes models for different fairness metrics without the user having to write additional code.
Interpretable models, surrogate modeling, and counterfactual instances are a few examples of how we gain insight into machine learning model predictions with human interpretable explanations. These insights help us reduce bias, improve prediction accuracy, and increase decision-makers’ and customers’ understanding of the outputs they rely on.
 Nori, H., Jenkins, S., Koch, P.L., & Caruana, R. (2019). InterpretML: A Unified Framework for Machine Learning Interpretability. ArXiv, abs/1909.09223.
 Arik, S., & Pfister, T. (2019). TabNet: Attentive Interpretable Tabular Learning. ArXiv, abs/1908.07442.
 Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2921-2929
 Hoover, B., Strobelt, H., & Gehrmann, S. (2019). exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models. ArXiv, abs/1910.05276.
 Mueller, Z. (2020, April 14). 04_TabNet. Retrieved 2020, from https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Tabular%20Notebooks/04_TabNet.ipynb
 Ribeiro, M.T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 Klaise, J., Van Looveren, A., Vacanti, G., & Coca, A. (2020). Alibi: Algorithms for monitoring and explaining machine learning models (0.4.0) [Computer software]. Seldon. https://github.com/SeldonIO/alibi
 Mothilal, R.K., Sharma, A., & Tan, C. (2020). Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
 Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., & Wilson, J. (2019). The What-If Tool: Interactive Probing of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics, 26, 56-65.