Skip to Main Content

As more machine learning tools reach patients, developers are starting to get smart about the potential for bias to seep in. But a growing body of research aims to emphasize that even carefully trained models — ones built to ignore race — can breed inequity in care.

Researchers at the Massachusetts Institute of Technology and IBM Research recently showed that algorithms based on clinical notes — the free-form text providers jot down during patient visits — could predict the self-identified race of a patient, even when the data had been stripped of explicit mentions of race. It’s a clear sign of a big problem: Race is so deeply embedded in clinical information that straightforward approaches like race redaction won’t cut it when it comes to making sure algorithms aren’t biased.


“People have this misconception that if they just include race as a variable or don’t include race as variable, it’s enough to deem a model to be fair or unfair,” said Suchi Saria, director of the machine learning and health care lab at Johns Hopkins University and CEO of Bayesian Health. “And the paper’s making clear that, actually, it’s not just the explicit mention of race that matters. Race information can be inferred from all the other data that exists.”

Unlock this article by subscribing to STAT+ and enjoy your first 30 days free!


Create a display name to comment

This name will appear with your comment

There was an error saving your display name. Please check and try again.