Although “garbage in, garbage out” has been a software truism for more than half a century, programmers can’t seem to avoid the first part — as researchers discovered when they investigated an algorithm widely used by hospitals to decide which patients get access to extra health care services. The artificial intelligence software equated health care spending with health, and it had a disturbing result: It routinely let healthier white patients into the programs ahead of black patients who were sicker and needed them more.
It was one of the clearest demonstrations yet that some, and perhaps many, of the algorithms that guide the health care given to tens of millions of Americans unintentionally replicate the racial blind spots and even biases of their developers.
This important research is the winner of the Editors’ Pick award in the 2020 STAT Madness contest. The work was led by Ziad Obermeyer of the University of California, Berkeley, and Sendhil Mullainathan of the University of Chicago’s Booth School of Business.
The annual competition to identify the year’s top discoveries in biomedicine started with 128 entries from U.S. research institutions, including an improved CRISPR genome editing tool; a gene therapy for “bubble boy” disease, which leaves newborns with a nonfunctioning immune system; and the discovery of a gut microbe that underlies alcoholic liver disease. We selected 64 for an NCAA basketball tournament-style bracket, based on originality, scientific rigor, and potential impact. Readers voted for a winner in each pairing until, after six rounds and 699,315 votes, a new technology for seeing tiny ovarian tumors, developed by MIT’s Koch Institute for Integrative Cancer Research, bested a new treatment for damaged hearts from the Texas Heart Institute and Rice University. STAT staffers evaluated the 64 to come up with this Editors’ Pick.
It wasn’t an easy choice. But the research on racially biased algorithms rose to the top. In addition to being rigorous, important, and innovative, it exemplifies a growing challenge in health care and biomedicine: separating hype from reality in terms of what artificial intelligence can do, from discovering drugs to diagnosing patients.
The researchers didn’t just publish their work and move on. Instead, they worked with the builders of the algorithm to fix it. And after hearing from insurers, hospitals, and others concerned that their algorithms, too, might be racially biased, they established an initiative at the Booth School to work pro bono with health systems and others to remedy that.
The goal of algorithm examined by Obermeyer and Mullainathan was to identify patients most likely to benefit from care management programs, which devote additional resources to those with high medical need. Among other things, the programs give patients access to dedicated phone lines, home visits, and prompt doctor appointments; reconcile prescriptions; and assign a nurse to a patient with, say, diabetes and heart disease to help her avoid hospitalization by seeing her primary care doctor more frequently.
“They’re like VIP programs,” Obermeyer said. “They get to know you and your medical problems.”
The bias comes from how the algorithm’s developers decided to identify patients at high risk of worsening health: by health care spending. More spending, they assumed, meant worse health and therefore greater health care need. “That’s not unreasonable,” Obermeyer said. “Very sick people do generate high health care costs.”
The problem is that white Americans spend more on health care than black Americans even when their health situations are identical. Whites, as a group, have more disposable income, more and better insurance, and greater access to medical providers; they therefore spend more on medications, doctor visits, preventive care, and elective surgeries.
And that’s how the algorithm tripped up. Since whites’ spending is higher, the algorithm concluded — mistakenly — that many white patients were sicker than medically identical or sicker black patients. It automatically tagged patients in the top 3% of medical spending for the VIP care management program intended for patients with complex medical needs. But the algorithm couldn’t distinguish between a 60-year-old white American’s spending of thousands of dollars on a knee replacement to improve his tennis game and a 60-year-older black American’s spending of the same amount to keep diabetes from killing him.
Result: White patients were much more likely to receive preferential access to the care management program even though, at the same amount of medical spending, black Americans had more chronic illnesses and significantly poorer health. They were the ones who should have gotten the extra attention.
As Mullainathan, a computational and behavioral science researcher, told STAT last year, “we haven’t told algorithms yet to do things without racial bias. We haven’t learned yet that this is an objective that we have to build into it.”
The algorithm was developed by the Optum unit of UnitedHealth Group, which sold it to academic medical centers and other hospitals. Before publishing their paper, the researchers emailed Optum (“Hey, you don’t know us, but …”). “They were incredibly responsive,” Obermeyer said. The company’s technical teams replicated the analysis in a larger, national dataset, confirming the racial bias and worked with the researchers to fix it.
They found that using proxies other than health care spending — in particular, how many chronic conditions a patient has and physiological variables that predict when a chronic condition such as diabetes will flare up and require emergency care or hospitalization — identified patients who truly needed care management. That more than doubled the percentage of black patients identified for automatic enrollment.
After hearing from insurers and providers wondering if their algorithms may also be racially biased, the researchers launched the project at the Booth school. They have been working with health care systems to analyze the potential for racial bias in algorithms already in use and those the systems are considering buying, and speaking to state attorneys general about the problem.
“It’s rare, when you’re an academic, that your research helps people this directly and so quickly,” Obermeyer said. For that, we think his work deserves the Editors Pick.