Achieving AI Interpretability

 Achieving AI Interpretability

Interests in Artificial Intelligence has grown tremendously in the decade due to several reasons: a vast amount of data that can be stored made possible by significantly larger storage capacity, much faster chips to process these data quickly, and great strides in AI algorithms enabled by the computing power that we have never seen before.

With the rapid development and applications of AI, people have increasingly become aware of the need to understand why AI models made the decisions they did. As an example for rationales for this, Professor Bertsimas said in a research conference in 2018 [1] that in working with hospitals to build AI models to help decide on which patients should be scheduled to receive kidney transplant next, the doctors refuse to accept any recommendations from AI models unless they can also clearly understand the reasons behind the recommended decisions. I commented that as a second example if AI models were to support jury members to deliberate on serious trials to decide if someone should go to jail, the jury members would clearly need to know from the AI models why they made certain judgment calls. Self-driving cars were also discussed as another example since people’s lives can be at stake and in case of traffic accidents, a court needs to know the party at fault by reviewing and understanding the AI model’s driving decisions. In one recent fatal car accident involving Tesla in China, Tesla was eventually found not responsible and the driver is the party at fault [2].

Google trends data also indicate the growing interest in Interpretable AI. Comparing the two figures below, interest in interpretable AI seems to have gained significant momentum for the past three years while interests in AI have been steadily growing for the past decade.

Figure 1. Google Trends: Trends in searches for “explainable AI” from 2012 to 2021

Figure 2. Google Trends: Trends in searches for “Artificial Intelligence” from 2012 to 2021

Broadly speaking, there are two different approaches to improve the interpretability of AI models. One general approach is to make efforts to explain existing AI models. For example, oftentimes some of the most powerful AI models (such as deep learning) tend to be black box models and specific tools can be built to help understand their recommendations. The second approach is to make a conscious effort to build AI models in such a way that the models themselves are more interpretable. We will review both categories in this article.

Interpreting Black Box Models

This approach can be model-specific or model-agnostic, and it attempts to provide insights into the model predictions. One such very popular tool is SHAP (SHapley Additive exPlanations) [3], and it is based on the Cooperative Game Theory and local explanations. SHAP is model agnostic and it tries to explain individual predictions by computing the contribution of each feature to the predictions. SHAP has the following desirable properties:

Local accuracy: explanation model should match the original model

Missingness: a missing feature gets no attribution

Consistency: when a model changes so that the marginal contribution of a feature value increases or stays the same, the Shapley value should not decrease

We explain SHAP’s usage with an example output from [4]. The below SHAP summary plots a single SHAP value for every data point in the dataset. Each row of the chart points to the feature on the left-hand side and is color-coded such that high feature values are red, and low values are blue. Values to the right are having a positive impact on the output, and values further to the left are having a negative impact on the output. As we can see, this plot quantifies and visualizes each feature’s contribution to the model’s output and clearly enhances our understanding of the model. SHAP is becoming quite popular since it seems to speak directly to human intuition better when compared to many other methods.

Figure 2: SHAP summary plot.
Figure 3. Example SHAP summary plot

Building Interpretable AI Models (build white box models)

One very good effort on this front was done by Bertsimas and Dunn [5], where they formulated classification problems as Mixed Integer Programming (Optimal Classification Trees, or OCT) to globally minimize misclassification rates instead of just focusing on maximizing purity for the next split, as is the case with CART. The resulting classification accuracies are 1–5% better than the results from CART on 53 UCI real-world datasets. During the 2018 Princeton Day of Optimization Conference, they also reported that their approach outperformed state-of-the-art models such as XGBoost and random forest by 2–7% when maximum tree depth exceeded 4.

Despite excellent model accuracy, the bigger breakthrough of the OCT actually lies more in the fact that with its roots in MIP formulation, the method is highly interpretable. This is very encouraging in several ways:

A. As mentioned above, the MIP approach (Optimal Classification Trees, or OCT, and OCT with hyperplanes, or OCT-H) achieved similar or even better out-of-sample accuracy than very strong models such as XGBoost and deep learning on many datasets.

B. Given its roots in deterministic optimization, Optimal Classification Trees is clearly a white-box model. The classification results are now very interpretable and transparent compared with the traditional AI models. This example shows that very effective AI models do not have to be black-box solutions.

C. The (more powerful) models with hyperplanes (OCT-H) are relatively easy to train for the optimization formulation but are not the case with the traditional models.

D. Just as we have witnessed in our work in industry, Bertsimas’ research also showed that the Integer Programming modeling does not automatically imply poor computational performance nowadays given the huge advances in MIP algorithms and computing power: OCT and OCT-H are both computed within practical times [6].  

In this article, we reviewed two distinct approaches to AI model interpretability: explain black box models or build white box models. We can see that both have achieved maturity for industrial applications. For areas such as medicine, law, and transportation, interpretability is becoming a necessary condition for the application of AI and the future certainly holds significant potentials for AI interpretability.


1. Princeton Day of Optimization, Princeton, NJ, September 28, 2018,

2. Tesla not responsible for E China’s Taizhou car accident, the driver takes full responsibility, Global Times, Jun 06, 2021

3. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774.

4. Lantos, D; Ogunlami, A; Regunath, G; How to explain your machine learning model using SHAP? Advancing Analytics, July, 2021

5. Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106, 1039–1082 (2017)

6. Xu Y. (2019) Solving Large Scale Optimization Problems in the Transportation Industry and Beyond Through Column Generation. In: Fathi M., Khakifirooz M., Pardalos P. (eds) Optimization in Large Scale Problems. Springer Optimization and Its Applications, vol 152. Springer, Cham. Link:

Author Bio

Yanqi Xu’s analytics experience spans several industries, and as part of the data science leadership team, Yanqi has helped companies such as United Airlines, Avis, Princess Cruises, Raytheon (Flight Options) and Verizon make significant strides in improving revenue and profits by developing award-winning models (2020 Edelman Award Finalist) in machine learning, price optimization, marketing, combinatorial optimization, and customer analytics.


Related post