Skip to Main Content

The growing use of artificial intelligence in medicine is paralleled by growing concern among many policymakers, patients, and physicians about the use of black-box algorithms. In a nutshell, it’s this: We don’t know what these algorithms are doing or how they are doing it, and since we aren’t in a position to understand them, they can’t be trusted and shouldn’t be relied upon.

A new field of research, dubbed explainable artificial intelligence (XAI), aims to address these concerns. As we argue in Science magazine, together with our colleagues I. Glenn Cohen and Theodoros Evgeniou, this approach may not help and, in some instances, can hurt.


Artificial intelligence (AI) systems, especially machine learning (ML) algorithms, are increasingly pervasive in health care. They are used for things like evaluating cardiovascular images, identifying eye disease, and detecting bone fractures. Many of these systems, and most of those cleared or approved for use by the Food and Drug Administration, rely on so-called black-box algorithms. While the notion of what constitutes a black-box algorithm is somewhat fluid, we think of it as an algorithm that is exceedingly difficult, or even impossible, for ordinary humans to understand.

Examples of black-box AI models would be any of a class of algorithms ordinarily labeled as “deep learning,” such as neural networks with many layers, convolutions, back propagation, and the like.

Promise and peril: How artificial intelligence is transforming health care This STAT report, supported by The Commonwealth Fund, explores how AI can exacerbate health disparities, endanger patient privacy, and perpetuate bias if implemented incautiously.

There are two key ways to understand how an AI system operates. The first is simple and intuitive: The system’s maker can stop using the black box for making predictions and use a transparent system — a white-box model — instead. While white-box models are also a fluid concept, examples include simple decision trees or ordinary regression with a few variables, where it is easy to tell how the variables combine to form the system’s predictions. For example, many doctors use a point scoring system for calculating patients’ heart disease or stroke risk based on their blood pressure, cholesterol levels, age, and other characteristics. Let’s call these white-box systems interpretable AI (IAI).


Interpretable AI is great for increasing transparency and helping one understand how a model works. It is simple, intuitive, and easy to grasp. And to the extent that such a simple white box can be substituted for a complex black-box, we are all for it. But herein lies the problem: For many medical applications, developers need to use a more complicated model.

One example is an application that relies on image recognition, in which the number of predictor variables is extremely large and the features are often highly engineered. Another example is an application that relies on genetic data. In such cases, developers generally won’t want to substitute an advanced deep-learning system with, for example, a simple decision tree. IAI is therefore not an adequate alternative, as it may not reach the necessary levels of accuracy that more complex black-box models can achieve.

To placate those who are worried about trust and transparency, developers who insist on using black-box systems turn to the second alternative, namely XAI. Here’s how it works: Given a black-box model that is used to make predictions or diagnoses, a second explanatory algorithm is developed that approximates the outputs of the black box. This second algorithm (itself a white-box model) is trained by fitting the predictions of the black box and not the original data. It is typically used to develop post-hoc explanations for the black-box outputs and not to make actual predictions.

In other words, the approach relies on this dual process: a black box for predictions together with a white box for after-the-fact explanations. Using stroke risk as an example, the white-box explanatory algorithm might tell a patient that their high risk of stroke, as it was predicted by the black-box model, is consistent with a linear model that relies on their age, blood pressure, and smoking behavior.

But notice that the post-hoc explanation is not the actual mechanism by which the black-box prediction was generated. Indeed, it is easy to imagine many other explanations that can be generated that are also consistent with the black-box prediction. For example, the patient’s risk of stroke could also be consistent with a decision tree that relies on one’s gender and diabetes status instead of blood pressure and smoking status. Similar patients may get very different post-hoc explanations. Because of the after-the-fact and fickle nature of these types of explanations, we call the understanding that XAI generates ersatz understanding.

When a user is provided with such an explanation, they are no closer to understanding what is happening inside the black box; rather, they are left with the false impression that they understand it better. This type of XAI is “fool’s gold” in this regard. The understanding it provides is comparable to being told that the reason street lights come on at night could be because the sun goes down, after observing these two events occurring together a number of times. Such explanations can lead to further epistemic risks, such as narrative fallacy — believing in a story that is simply false — or potentially to overconfidence if, for example, the provided (wrong) explanation reinforces the users’ prior beliefs.

Because this form of XAI is fool’s gold, it is unlikely to provide the benefits that are often touted of it. For example, since it does not add to one’s understanding of a black-box system, it is unlikely to increase trust in it. Likewise, since it does not enable others to open up the black box, so to speak, it is unlikely to help make AI/ML systems more accountable.

Requiring explainability for health care artificial intelligence and machine learning may also limit innovation — limiting developers to algorithms that can be explained sufficiently well can undermine accuracy.

Instead of focusing on explainability, the FDA and other regulators should closely scrutinize the aspects of AI/ML that affect patients, such as safety and effectiveness, and consider subjecting more health-related products based on artificial intelligence and machine learning to clinical trials. Human factors play an important role in the safe usage of technology and regulators, as well as product developers and researchers, need to consider them carefully when designing AI/ML systems that can be trusted.

Boris Babic is an assistant professor of philosophy and of statistics at the University of Toronto. Sara Gerke is an assistant professor of law at Penn State Dickinson Law. This essay was adapted from a longer article in Science magazine by Boris Babic, Sara Gerke, Theodoros Evgeniou, and I. Glenn Cohen.

Create a display name to comment

This name will appear with your comment

There was an error saving your display name. Please check and try again.