An algorithm commonly used by hospitals and other health systems to predict which patients are most likely to need follow-up care classified white patients overall as being more ill than black patients — even when they were just as sick, a new study finds.

Overall, only 18% of the patients identified by the algorithm as needing more care were black, compared to about 82% of white patients. If the algorithm were to reflect the true proportion of the sickest black and white patients, those figures should have been about 46% and 53%, respectively. The research was published Thursday in Science.

All told, health systems use the algorithm for some 100 million people across the country.

advertisement

The study’s authors then retrained a new algorithm using patients’ biological data, rather than the insurance claims data that the original program used, and found an 84% reduction in bias. Previously, the algorithm was failing to account for a collective nearly 50,000 chronic conditions experienced by black patients. After rejiggering the algorithm, that number dropped to fewer than 8,000. The reduction in bias emphasized what many in the health technology field believe: Algorithms may only be as good as the data behind them.

Sendhil Mullainathan
Sendhil Mullainathan John Zich

STAT spoke with Sendhil Mullainathan, a computational and behavioral science researcher at the University of Chicago’s Booth School of Business and senior author of the new study, to learn more. This interview has been condensed and lightly edited for clarity.

What was the inspiration for this study?

There’s a lot of discussion about algorithmic bias in the news, but we actually don’t know much about algorithms that are already implemented at a large scale. You don’t really get access from the inside. Part of our interest was [in knowing if there] is bias, how these things could play out.

Why is there a suspicion of bias in such programs?

I think the suspicion comes because, in general, these algorithms are built on data and those reflect systemic biases, and so won’t the algorithm also reflect the biases?

The other part is that algorithms are only as good as the objectives we give them. Much like the people, you can’t ask a person to do “A” and then be disappointed that they didn’t do “B.” Algorithms are the same — you give it a very narrow objective. But we haven’t told algorithms yet to do things without racial bias. We haven’t learned yet that this is an objective that we have to build into it.

Who uses this algorithm you examined?

It’s a category of algorithms of which there are several manufacturers. [Looking at] past health records, [they] predict future health. It’s applied by health systems to over 100 million people, so quite widespread. And it’s used for care coordination programs — these algorithms are meant to flag people for chance that in the next year, they’ll need extra care. Health care systems take these algorithms, and use them to rank people and say these are some of the people who might need extra help.

For example, if you have diabetes and heart disease, you’re going to be at the hospital a lot. We might give you a dedicated nurse who can tell you whether to come in or not. Sometimes it’s making sure they don’t come in to the [emergency room], but go to a special desk instead.

What did you find?

We thought, if this algorithm were unbiased, it shouldn’t matter whether a person is black or white. But we found that similarly sick black patients are ranked much lower [by the algorithm] — as less sick — than white patients.

This basically says that care coordination programs, this part of how we’re coordinating such programs, is having massive gaps.

Was this surprising?

[This] is a gigantic number, and the magnitude of it is huge. But it’s not immediately obvious why there should have been bias going in. We always fear bias, but I think you’d have to be particularly pessimistic to be believe there’s bias everywhere.

It’s not clear why there would be bias in this direction. If I told you that the American health care system doesn’t serve black patients very well, then you would expect them to be sicker. But if they’re sicker, then you’d expect them to be flagged more by the algorithm. The bias should have been going in the other direction.

How do you account for genetic predispositions?

The goal [of the algorithm] isn’t to get to the causes of risk, but simply to identify people at risk. Whatever the reason for the gap, [our findings] mean that these sick people we are trying to target, we end up missing.

How do you think the bias was introduced?

It kind of arose in a subtle way. When you look at the literature, “sickness” could be defined by the health care we provide you. In other words, you’re sicker the more dollars we spend on you. The other way we could define sickness is physiologically, with things like high blood pressure.

It turns out that the algorithm was trained on dollars spent and the kind of care we deliver rather than on the underlying physiology. It looks like the algorithm took the system-wide problem of the difference between these two [definitions] that we often take to be synonymous — but there’s a difference between them, especially for black patients versus white patients — and expanding that and magnifying it.

Is there a way to fix it?

When we retrained a new algorithm on actual physiological variables, the gap then completely disappears. And it turns out that you can get almost all of the efficiency properties of the original algorithm without the racial gap. This, to me, is the bigger lesson for algorithms as a whole: Concepts that we as humans tend to take synonymously — like care in dollars and care in biological terms — algorithms take them literally.

How can we prevent things like this in the future?

When there’s new technology like this, it takes time for them to go from prototype to a larger scale, and some of these things don’t show up until later. If we have a prototype, there needs to be more questions [like about racial bias] that need to be asked. When these things go to scale, we just need a different way to be smarter about them.

Maybe the lesson here is to be very careful about the data the algorithm is being trained on. We should also spend a lot of time defining what our goals are. Do we really care only about insured people [if we’re using claims data to train algorithms], but also uninsured? Have I asked [the algorithm to do] everything I care about?

I think we understate the enormity of the problem in front of us. Because it’s new technology, the learning phase of it is not in the algorithm learning but in us learning about bigger, structural implications.

Leave a Comment

Please enter your name.
Please enter a comment.

  • Sadly, as a person of Color I am not surprised by this at all, you have to remember that an algorithm is only as good as the person(s) who created it. We have knowingly lived with and experienced this type of racism and all of the insecurities and unjust treatment that come with it for many years, pray that one day we will all see each other as Human beings, the color of your skin does not define who you are as a person. Never understood the concept of inequality and hatred, what a true waste of energy.

  • Does this have anything to do with the lack of care given black women who go to the hospital to give birth? I would like to know as they are the most likely to die… even in so-called “good” hospitals such as Cedars Sinai in Los Angeles/Beverly Hills.

    And I am talking from experience.

  • Correlation is not causation. Or did I miss part of the story? This story, as written, seems to clearly indicate an “economic” bias in the algorithm to be sure.
    If you weren’t coming to the doctor as often or hadn’t had more procedures in the past, you weren’t as sick. So you were less likely to need more treatment in the future.
    That being said, “anyone” without enough cash, or under insured, or just an aversion to the doctor would be “missed” by this algorithm.
    That means poor white people as well correct?
    This algorithm seems clearly designed to benefit one group; hypochondriacs.

    • You are totally right! The algorithm is biased economically, not racially. It’s a shame they used such a sensationalist title for their paper that doesn’t indicate the real issue. They had to name it “Dissecting economic bias in an algorithm used to manage the health of populations”. The algorithm is still bad, but its flaws has nothing to do with race. In fact, in the paper they say: “Notably, the algorithm specifically excludes race.” I don’t understand how Science could publish it leaving such an inaccurate title. It’s so f*cing sad science becomes more and more politicized.

  • Algorithms are sophisticated mathematical systems based on data and formulas that are only as good as the humans who designed the algorithms. To use health insurance data in the algorithm and apply this to all in a nation that however has vast discrepancies in health insurance access and affordability is a gigantic flaw. This unthoughtful design is quite serious if health systems use this algorithm for 100 million people ……

  • Usually when an algorithm is used, the individuals using them have no idea of the color of an individuals skin. So, are you saying the designers of the algorithm are biased, promoting misleading data?

    • They don’t have to be consciously biased. The healthcare data set the designers had to work with didn’t have enough representation of black patients to be accurate, because black patients are less likely to spend lots of money on treatments even when they are sick. That’s why it’s important to look at the real-life results and rework algorithms.