Skip to Main Content
Contribute Try STAT+ Today

What are the most important ethical considerations for artificial intelligence (AI) in health care?

The World Health Organization tried to answer this question in its recent report “Ethics and Governance of Artificial Intelligence for Health.” It offers recommendations on how to design safe, transparent, and equitable AI products and applications that can help providers make informed medical decisions and help patients achieve positive outcomes. The report’s recommendations include:

  • Humans should remain in control of health care systems and medical decisions.
  • AI products should be required to meet standards for safety, accuracy, and effectiveness within well-defined use cases.
  • AI developers should be transparent about how products are designed and function before they’re used.
  • Health care businesses that rely on AI should ensure they are used under appropriate conditions by trained personnel.
  • AI must be designed to encourage inclusiveness and equality.

All of these are laudable. But as someone deeply involved in applying AI to health care, I found one element grating: Highlighting inclusivity and equality as things to be “encouraged” is not the way forward, especially with something as important as health care.


Inclusivity and equality must be built into the DNA of a product from day one, and it does not happen by simply checking a politically correct box.

Remarkable progress has been made in medicine over the last century. But it is not equal progress, largely due to inequality in the development of drugs and diagnostics. For example, it was only in 1993 that the United States mandated adequate inclusion of women in clinical trials for medicines. And systemic racism can still be seen in revised U.S. guidelines which now, as a result of Covid-19, recommend that women under age 50 skip annual mammograms even though breast cancer diagnoses are at their highest for Black, Hispanic and Asian women in their 40s.


There are troubling indications that significant disparities exist in the performance of AI-enabled health care solutions. One algorithm widely used by health care providers recommended extra care for white patients twice as often as it did for black patients. Several studies have found that deep-learning solutions trained to detect skin cancer are less accurate for people with darker complexions. Even the most thoughtful and proficient AI product is in danger of perpetuating exclusion if bias or omissions in the data prevent it from working equally well for everyone it aims to help.

Unfortunately, bias exists everywhere in the world and permeates the datasets used to develop and test AI products.

Conscious bias can actually be a positive, useful tool in developing data for training AI applications, but it must be controlled. A developer makes trade-offs in the model: a little bit of improvement here means less performance there until a balance is struck that gives the optimal outcomes for the use case. Think of it as playing Whac-A-Mole — eliminate one poor area of performance in a model and another one can spring up unexpectedly somewhere else. The job of data scientists using AI to improve health outcomes is to whack as many “data moles” as possible. Developers need to focus the training sets for their models to ensure they’re using diverse data, or risk creating conscious bias.

Unconscious bias in data is a product of age-old inequalities within society at large. It is challenging to address precisely because people are often blind to it. The data itself is not at fault here. Individuals creating AI tools must change their mindsets, challenge their assumptions, and ensure they assess for their own biases continuously through the development and use of AI solutions. This requires dedicated research efforts to develop methodology for ensuring fair AI algorithms. Otherwise, developers will just continue to build more solutions for the privileged.

What can be done?

The lack of diverse data to develop and train AI products is currently the primary inhibitor to making substantive progress in this area.

Potential biases need to be identified from the ground up using effective validation frameworks including subgroup analysis. Mitigation strategies need to be carried throughout the entire development cycle, from conceptualization of a potential product to post-deployment assessment of biases in the predictions. This means using real-world, representative clinical data to develop and validate AI products as opposed to using overly cleaned and selected datasets that are fraught with potential bias. This also requires the development of methodology for detecting and identifying causes of bias, and demands transparent reporting of trials and frameworks for auditing algorithms such that the reported results and claimed performance can be scrutinized.

I work in an industry at the intersection of AI technology and health care applications. My colleagues and I must constantly assess AI products for bias to make sure we are delivering solutions that give any person, no matter who or where they are, a better chance at positive health outcomes. It is a painstaking task but entirely worth it.

Developers of AI-enabled health care products have a unique opportunity to address current health inequalities. Doing this will require being creative and sometimes thinking out of the box to develop innovative solutions that may even reduce health disparities. Choosing to solve medical issues that affect women and minorities, by making sure that every health care solution is developed and tested for all of the people it will impact and by considering how these technologies can advance health care in the developing world, could usher in a new era of equality in health care.

Ben Glocker is the head of machine learning research at Kheiron Medical and a reader in machine learning for imaging in the Department of Computing at Imperial College London.

Create a display name to comment

This name will appear with your comment