The daily email that arrived in physician Samantha Wang’s inbox at 8 a.m., just before morning rounds, contained a list of names and a warning: These patients are at high risk of dying within the next year.

One name that turned up again and again belonged to a man in his 40s, who had been admitted to Stanford University’s hospital the previous month with a serious viral respiratory infection. He was still much too ill to go home, but Wang was a bit surprised that the email had flagged him among her patients least likely to be alive in a year’s time.

This list of names was generated by a machine, an algorithm that had reached its conclusions by scanning the patients’ medical records. The email was meant as something of a nudge, to encourage Wang to broach a delicate conversation with her patient about his goals, values, and wishes for his care should his condition worsen.

advertisement

It left her pondering: Why him? And should she heed the suggestion to have that talk?

Those kinds of questions are increasingly cropping up among clinicians at the handful of hospitals and clinics around the country deploying cutting-edge artificial intelligence models in palliative care. The tools spit out cold actuarial calculations to spur clinicians to ask seriously ill patients some of the most intimate and deeply human questions: What are your most important goals if you get sicker? What abilities are so central to your life that you can’t imagine living without them? And if your health declines, how much are you willing to go through in exchange for the possibility of more time?

advertisement

Hospitals and clinics are running into thorny challenges and making weighty judgment calls as they try to weave an algorithm with such momentous implications into the fabric of a clinical team’s already complex and frenetic workflow, a STAT examination found. STAT spoke with 15 clinicians, researchers, developers, and experts in AI and palliative care to understand how such AI models are being deployed at Stanford, the University of Pennsylvania, and a community oncology practice near Seattle — and how they might be received by patients and providers if they’re rolled out more widely.

At the same time, clinicians and researchers experimenting with these systems say they’re gathering early data that suggest these algorithms are triggering important conversations that might otherwise have happened too late, or not at all, in the absence of AI. That’s badly needed, they say, in a health care system where doctors have long been stretched too thin and lacked the training to prioritize talking with seriously ill patients about end-of-life care.

“A lot of times, we think about it too late — and we think about it when the patient is decompensating, or they’re really, really struggling, or they need some kind of urgent intervention to turn them around,” said Wang, an inpatient medicine physician at Stanford. She has broached advance care planning with her patients long before an AI model started urging her to do so — but the algorithm has made her judgement sharper, she said, because it “makes you aware of that blind spot.”

There’s plenty of room for improvement. When it comes to picking out patients who could benefit from such conversations, most hospitals and clinics currently use an “ad hoc system of identifying patients that really allows for the introduction of all kinds of explicit and implicit biases — and that means we only engage some of the right patients some of the time,” said Justin Sanders, a palliative care physician at Dana-Farber Cancer Institute and Brigham and Women’s Hospital in Boston.

Sanders, who is not involved in any of the rollouts of the AI models, works with health systems across the country as part of a program to improve care of the seriously ill. Using AI has promise, he said, because “any systematic approach to identifying people for serious illness conversations is an improvement over what happens now.”

The architects of the AI systems described making careful choices about how much information to disclose to their users — and, on the flip side, what information to withhold from clinicians and patients. The daily emails sent to Stanford clinicians during last winter’s rollout, for instance, didn’t include any numbers alongside patients’ names, such as the algorithm’s calculation of the probability that a flagged patient will die in the next 12 months.

“It can seem very awkward, and like a big shock — this ominous force has predicted that you could pass away in the next 12 months.”

Samantha Wang, inpatient medicine physician, Stanford

All of the leaders of the clinical rollouts contacted for this story said they discourage clinicians from mentioning to patients that they were identified by an AI system. Doctors say they don’t want to bring it up, either. “It can seem very awkward, and like a big shock — this ominous force has predicted that you could pass away in the next 12 months,” Wang said.

Clinicians on the frontlines, too, are figuring out how to balance their own judgements with the AI’s predictions, which are often spotty when it comes to identifying the patients who actually end up dying, according to early data. Like Wang, several providers told STAT they’re sometimes surprised by which of their patients the algorithm decides to flag.

They also described having to decide what to do when they disagree with the algorithm — whether that’s because they think a patient is in better health than the AI does, or because they want to initiate a conversation with a patient they’re worried about, but who hasn’t been named by the model. It’s not as if physicians can ask the AI its reasoning.

And even when clinicians agree with the algorithm’s recommendations, they still must decide when and how to broach such a sensitive subject with patients, and which conversations to prioritize when the list of names is long or a day is particularly hectic.

Consider the patient with the viral infection who kept showing up on Wang’s list in January. During his time in the hospital, which included a stay in the intensive care unit, he’d also been diagnosed with rheumatologic and heart conditions and been put on more than half a dozen medications.

This man wouldn’t have normally stood out to Wang as urgently high priority for an end-of-life conversation, because she guessed there was maybe a 50-50 chance he’d be doing well in a year’s time. But since he kept getting flagged by the algorithm, Wang decided to talk to him about his experience being intubated in the ICU, with invasive lines and strong drugs helping to keep him alive.

What happened to you was really, really scary, Wang recalled telling him on the winter day she decided to raise the topic. She asked him: What did you think about that experience? And would you be willing to go through it again — in order to have a bit more time?

Yes, the man told Wang. He would do it again.

The intimate conversations being prompted by these AI systems are the result of countless design choices — about which data to analyze, which patients to flag, and how to nudge busy clinicians.

The models are generally built around data stored in electronic health records and rely on various machine learning techniques. They’re trained and tested on thousands of data points from patients who’ve been treated before, including their diagnoses, their medications, and whether they deteriorated and died. Some models also sample from socioeconomic data and information from insurance claims.

Once deployed, the models are tasked with sifting through current patients’ medical records to predict whether they’re at elevated risk of dying in the coming weeks or months. They rely on different thresholds to determine which patients to flag as being at high risk — a bit like how Google decides which results to put on the first page of a search.

Consider an algorithm developed by researchers at Penn that’s being used to surface cancer patients in the health system there. It starts by identifying only those it deems have at least a 10% chance of dying in the next six months — and then flags some of those patients to clinicians.

Other models — like a commercial one developed by Jvion, a Georgia-based health care AI company — flag patients based on how they stack up against their peers. When it’s rolled out in an oncology practice, Jvion’s model compares all of the clinic’s patients — and then flags to clinicians the 1% or 2% of them it deems to have the highest chance of dying in the next month, according to John Frownfelter, a physician who serves as Jvion’s chief medical information officer.

Jvion’s tool is being piloted in several oncology practices around the country, including Northwest Medical Specialties, which delivers outpatient care to cancer patients at five clinics south of Seattle. Every Monday, a patient care coordinator at Northwest sends out an email to the practice’s clinicians listing all of the patients that the Jvion algorithm has identified as being at high or medium risk of dying within the next month.

Those notifications, too, are the product of careful consideration on the part of architects of the AI systems, who were mindful of the fact that frontline providers are already flooded with alerts every day.

At Penn, physicians participating in the project never get any more than six of their patients flagged each week, their names delivered in morning text messages. “We didn’t want clinicians getting fed up with a bunch of text messages and emails,” said Ravi Parikh, an oncologist and researcher leading the project there.

The architects of Stanford’s system wanted to avoid distracting or confusing clinicians with a prediction that may not be accurate — which is why they decided against including the algorithm’s assessment of the odds that a patient will die in the next 12 months.

“We don’t think the probability is accurate enough, nor do we think human beings — clinicians — are able to really appropriately interpret the meaning of that number,” said Ron Li, a Stanford physician and clinical informaticist who is one of the leaders of the rollout there.

After a pilot over the course of a few months last winter, Stanford plans to introduce the tool this summer as part of normal workflow; it will be used not just by physicians like Wang, but also by occupational therapists and social workers who care for and talk with seriously ill patients with a range of medical conditions.

All those design choices and procedures build up to the most important part of the process: the actual conversation with the patient.

Stanford and Penn have trained their clinicians on how to approach these discussions using a guide developed by Ariadne Labs, the organization founded by the author-physician Atul Gawande. Among the guidance to clinicians: Ask for the patient’s permission to have the conversation. Check how well the patient understands their current state of health.

And don’t be afraid of long moments of silence.

There’s one thing that almost never gets brought up in these conversations: the fact that the discussion was prompted, at least in part, by an AI.

Researchers and clinicians say they have good reasons for not mentioning it.

”To say a computer or a math equation has predicted that you could pass away within a year would be very, very devastating and would be really tough for patients to hear,” Stanford’s Wang said.

It’s also a matter of making the most of the brief time that clinicians have with each patient, the system architects say.

“When you have 30 minutes or 40 minutes to talk with somebody, you don’t want to begin a conversation by saying an algorithm flagged you — and then waste their other 29 minutes answering their questions about it,” said Stanford biomedical informaticist Nigam Shah, one of the leaders of the rollout there.

The decision to initiate an advance care planning conversation is also informed by many other factors, such as a clinician’s judgment and a patient’s symptoms and lab results.

“What we explicitly said to clinicians was: ‘If the algorithm would be the only reason you’re having a conversation with this patient, that’s not enough of a good reason to have the conversation — because the algorithm could be wrong,’” said Penn’s Parikh.

In the strictest technical terms, the algorithms can’t really be wrong: They’re just predicting which patients are at elevated risk of dying soon, not whether patients will definitely die. But those risk estimates are just estimates — the systems sometimes flag patients who don’t end up dying in the coming weeks or months, or miss patients who do, a small stream of early research suggests.

In a study of the Penn algorithm, researchers looked at how more than 25,000 cancer patients fared after the AI system predicted their risk of dying in the next six months. Among those patients that the algorithm predicted were at high risk of dying in that period, 45% actually died, compared to 3% of patients that the model predicted were at low risk of death during that period.

At Northwest, there’s close to a 40% chance that patients flagged as high risk by the Jvion model will go on to die in the next month, according to Jvion’s Frownfelter.

Eric Topol, a cardiologist and AI expert at Scripps Research in San Diego, said that without more accurate models, he’s skeptical of the role AI systems could play in palliative care. “I wouldn’t think this is a particularly good use for AI unless and until it is shown that the algorithm being used is extremely accurate,” Topol said. “Otherwise, it will not only add to the burden of busy clinicians, but may induce anxiety in families of affected patients.”

There is also a mismatch between the task of these models — predicting a patient’s odds of death — and how they’re actually being used — to try to identify who will benefit most from an advance care planning conversation.

“The label you want is: ‘Will benefit from palliative care.’ But the label you’re predicting for is: ‘Will die.’”

Stanford biomedical informaticist Nigam Shah

As Stanford’s Shah put it: “The label you want is: ‘Will benefit from palliative care.’ But the label you’re predicting for is: ‘Will die.’”

Even as the models flag the wrong patients, the emerging data indicate, they have potential to spur more conversations about end-of-life care — and perhaps to spur better care, too.

Crucially, these AI models have yet to be tested using a gold-standard study design that would compare outcomes when some clinics or patients are randomly assigned to use the AI tool, and others are randomly assigned to the usual strategies for encouraging conversations about end-of-life care. Instead, the studies presented so far have largely focused on comparing outcomes at a given hospital or practice before and after the tool was implemented.

Consider the data presented in May on the models from Penn and Jvion at the virtual gathering of the American Society of Clinical Oncology, the big annual cancer meeting that’s closely watched by oncologists around the world.

In another study of the Penn algorithm, researchers found that when the health system’s oncology clinics started using the algorithm, 4% of patient visits involved a documented conversation about a patient’s wishes and goals — compared to 1.2% of visits in the weeks before the algorithm was rolled out.

A study on the rollout of Jvion’s model at Northwest found that the rate at which palliative care consults were conducted increased 168% in the 17 months after the tool was deployed, compared to the five months prior. And the rate at which Northwest patients were referred to hospice jumped eightfold.

During the study period, Jvion’s AI model identified 886 Northwest patients as being at high risk of death in the next month. One of them was an elderly woman; she lived alone and had breast cancer that had spread to her liver. She had been a Northwest patient for years, and had been responding well to an oral chemotherapy, even though she complained often about how the treatment wore her down.

That was why her doctor, Sibel Blau, was so surprised to see her show up one day last year on her list of high-risk patients. Concerned, Blau arranged for the patient to come in for a visit later that afternoon. A friend drove her to the clinic, where she got her blood drawn. Everything seemed fine, and the patient was sent home.

Then the clinic got a call from the friend: The patient had collapsed as soon as she got home. It turned out she had a urinary tract infection that had caused her to be septic; she could have died if she hadn’t gotten prompt treatment, Blau said.

The patient responded to her antibiotics, and soon seemed back to normal. But not long after, the AI model flagged her again.

Blau called the patient into the clinic once more, and sat down to talk with her, with the goal of investigating what was wrong. This time, though, it was a malady of a different sort.

When I tell you I’m tired of this chemo, I really mean it, Blau recalled the patient saying. The patient told her: I just want to go. My body is getting weaker.

In each instance, Blau was grateful that the Jvion model had flagged her patient. She doesn’t have time to talk about end of life during every one of her patient visits, she said, nor would it be appropriate to do so. “This is a tool that takes me one step closer to asking the right question,” Blau said.

The second time, asking the right question led the patient to decide to stop chemotherapy and enter hospice care.

A few months later, she died peacefully.

This is part of a yearlong series of articles exploring the use of artificial intelligence in health care that is partly funded by a grant from the Commonwealth Fund.

A roundup of STAT’s top stories of the day in science and medicine

Privacy Policy