ROCHESTER, Minn. — It would be easy to wonder what Zachi Attia is doing in the cardiac operating rooms of one of America’s most prestigious hospitals. He has no formal medical training or surgical expertise. He cannot treat arrhythmias, repair heart valves, or unclog arteries. The first time he watched a live procedure, he worried he might faint.
But at Mayo Clinic, the 33-year-old machine learning engineer has become a central figure in one of the nation’s most ambitious efforts to revamp heart disease treatment using artificial intelligence.
Working side by side with physicians, he has built algorithms that in studies have shown a remarkable ability to unmask heart abnormalities long before patients begin experiencing symptoms. Much of that work involves training computers to sift through mountains of patient data to pluck out patterns invisible to even the most skilled cardiologists.
While his research is highly technical, Attia’s installation on the front lines of care is an attempt to address a much broader and more pressing question facing hospitals and technology companies as they seek to integrate AI into routine medical practice: How can algorithms be constructed to not only produce headline-grabbing results in papers, but also improve outcomes for real patients?
That is the long, last mile AI must travel to prove its worth in medicine, a mile that divides progress from empty promises, value from waste. Mayo is betting that, by exposing Attia and other software engineers to the daily business of delivering care, it can produce algorithms whose performance will hold up amid the innumerable complexities of clinical settings.
“That’s the real acid test,” said Dr. Eric Topol, a cardiologist and AI expert at the Scripps Research Translational Institute in San Diego. “You don’t want to go forward with something validated [on a computer] and start using it on patients. We know there are all sorts of issues that are different than they are on some pristine, cleaned dataset.”
Algorithms must work on real patients with diverse backgrounds and complex medical issues. They must be scrubbed for bias and unintended impacts, such as “false positives” that can lead to unnecessary care and increased costs. They also must be designed to fit into doctors’ and nurses’ routines: It is not enough to simply surface new information. That information must be clinically meaningful, easily understood, and delivered in a time window in which practitioners can use it to improve care for patients. The final — and perhaps the most daunting — hurdle is for AI developers to win physicians’ trust in an algorithm.
Attia, who started his career as a software developer and cybersecurity expert, is the co-director of an artificial intelligence team created by Dr. Paul Friedman, an electrophysiologist who chairs Mayo’s cardiology department. He is one of five software engineers and data scientists who accompany physicians on rounds, observe procedures, and discuss way to use AI to address gaps in care.
In just the past three years, the team has published more than two dozen studies on AI in cardiology, and it is now field-testing an algorithm to detect a weak heart pump in dozens of primary care clinics. Most of the algorithms under development do not involve surgeries or emergency care, but the physiological events that occur far upstream, where abnormalities silently take root and begin to compromise heart function. Each year in the U.S., hundreds of thousands of patients die from utterly treatable cardiac illnesses because these problems are not detected in time, causing their hearts to sputter, skip, and sometimes stop.
That, Attia said, is where artificial intelligence can make a huge difference.
“We want to make sure that, for anything that is treatable, we catch it as soon as we can,” he said. “If we can solve it before it becomes a big deal, then we can really affect how cardiac care is delivered.”
By learning the daily practice of medicine, Attia is learning how to hit that target. He’s dissected cadavers to understand the anatomy of the heart and the types of abnormalities his team’s algorithms are seeking to detect. He’s watched physicians place stents in arteries and perform workups on patients to understand how they process and apply physiological information. And he’s learned to interpret clinical data not just to connect lines of code in a product, but to detect subtle differences in heart function that offer hints of impending disease in a patient.
“If I wasn’t here [at Mayo] I’d look at my results and say, ‘Well, I have an accuracy of X and a sensitivity of Y,” Attia said. “But because I’m here I can go to physicians with examples from the data and ask, ‘What do you think is happening here, or what is going on there?’ And then I can improve the algorithm.”
In some cases, those conversations have resulted in decisions to exclude data that might have skewed the output of an algorithm. In others, Attia said, they have helped him home in on signals that are more relevant to clinical circumstances and help address blindspots in patient care.
“You learn to build the AI in a way that fits the physician practice, and not the other way around,” he said. “Physicians are very pragmatic. They are not interested in cool, futuristic tests. They want to see how something is going to improve the prognosis for their patients.”
There is no better example of AI’s ability to improve heart care, and the complexity of doing so, than Mayo’s attempt to tackle one of the field’s most vexing conditions, an irregular rhythm known as atrial fibrillation.
A key portion of that work has unfolded in a small office Attia shares with Friedman in Mayo’s electrophysiology unit. It is a messy space where scattered papers and books — and a marker-stained whiteboard — speak to hours they have spent diagramming a-fib and other arrhythmias, and the machine-learning architectures meant to pinpoint signs of deteriorating heart function in haystacks of patient data.
The largest share of the data is derived from electrocardiograms (EKGs), a century-old technology that is commonly used to evaluate heart function by recording electrical pulses that cause the heart to beat. About 250,000 EKGs are performed every year at Mayo, which has a digital dataset of 7 million records stretching back to the mid-1990s.
EKGs have been able to detect a-fib for decades, but Mayo is seeking to take it a step further — by trying to predict which patients will experience this arrhythmia in the future.
When atrial fibrillation occurs, the upper chambers of the heart (the atria) beat irregularly instead of operating efficiently to move blood through the heart. This can allow blood to pool and clot, and if a clot breaks loose and lodges in the brain, a stroke results. The challenge for physicians is that the condition occurs intermittently, is difficult to detect, and can worsen over time.
Using an AI-enabled EKG could help physicians identify patients before symptoms appear and intervene to avert a stroke or other harm.
To analyze the data, Attia and Friedman used a type of machine learning system known as a convolutional neural network. Perhaps best known for their use in facial recognition technology, these networks separately analyze discrete aspects of data to form a conclusion about what the image represents. It is essentially a form of superpowered machine vision that uses math to detect subtleties that humans overlook.
They fed the system 10-second snippets of EKG data labeled in two categories — patients with atrial fibrillation and those without the condition. Because he’d worked so closely with the cardiologists, Attia knew the “a-fib” group should include patients with EKGs labeled “a-flutter,” a condition that is distinct from a-fib but treated the same way.
Attia and the cardiologists devised a novel method of training the AI — instead of showing it EKGs with patients’ hearts in atrial fibrillation, they showed it EKGs in normal rhythm from both groups of patients. Once exposed to hundreds of thousands of EKGs, the AI zeroed in on an electrical signature of the arrhythmia, a kind of digital code that indicates whether someone has had the condition previously, and is therefore likely to experience it again.
“That was the one that surprised me more than any of the others,” Friedman said of the a-fib algorithm. “It’s like looking out at the ocean on a calm day and asking, ‘Were there big waves here yesterday?’ It’s stunning.”
In a study published in August, Mayo reported the algorithm was able to accurately identify patients with a-fib at an 80-percent accuracy rate. On a recent afternoon, its power was displayed in the case of a patient who had undergone EKGs over a 30-year period but had never been diagnosed with a-fib. Inside a conference room, a group of engineers and cardiologists scanned the peaks and valleys of the data projected on a screen for any sign of an abnormality.
Dr. Samuel Asirvatham, an electrophysiologist who reads EKGs as automatically as most people drive a flat stretch of interstate, jumped up from his chair to take a closer look. He flipped forward in the series of EKGs and then back, but nothing seemed to call out a certainty of atrial fibrillation. However, the AI system, when it was shown the same data, detected a hidden pattern pinpointing two occasions when the patient’s risk of atrial fibrillation had increased dramatically.
As it turned out, both of those EKGs preceded cryptogenic strokes, or strokes of unknown cause, that, in hindsight, may have been caused by the a-fib.
“If we’d have used this system, presumably [this patient] would have been treated with anticoagulants before these strokes,” said Friedman, referring to blood thinners used to prevent clots from forming in patients with a-fib.
“Yes, seven years before,” added Attia.
It is at this juncture where the use of AI gets especially tricky. Blood thinners carry substantial risk of bleeding and are typically prescribed only when patients have had a verified episode of atrial fibrillation. Topol, the cardiologist and AI expert from Scripps, said the algorithm could be helpful in treating patients who suffer cryptogenic strokes if it showed the patient had an extremely high risk of a-fib.
“But you’d still want to nail it,” Topol said. “You’d still like to see the atrial fibrillation [first]. Putting someone on blood thinners means that for the rest of their life they are going to have [an increased risk] of a serious bleed” and may be limited from participating in certain activities, such as contact sports.
After the meeting to review the EKGs, Mayo’s patient did experience a documented episode of atrial fibrillation, making it easier for physicians to decide on a course of action. But the ultimate goal of the AI is to intervene earlier in the disease process, which means relying at least in part on the ability of an algorithm to flag something in the data that physicians cannot verify with their own eyes.
Before that could happen, Mayo must pursue clinical studies to examine how the use of the algorithm affects doctors’ decision-making and patient outcomes. For now, Attia has built a dashboard inside the electronic medical record system that allows doctors to scroll through a patient’s EKGs and examine the AI’s conclusions on the risk of a-fib. But it is only being used as an advisory tool, rather than an actionable screening test.
As it seeks to incorporate algorithms into clinical care, Mayo is already bumping into the complexity of real-life interactions with patients.
Earlier this year, the health system launched a randomized trial — one of the first of its kind in the country — to evaluate the impact of another algorithm on physicians’ decision-making. The algorithm, which also analyzes EKG data, is designed to detect a weak heart pump, known clinically as low ejection fraction. The condition is treatable with interventions ranging from surgery to simple lifestyle changes, but millions of Americans — about 3% to 6% of the population — don’t know they have it, making them more vulnerable to developing heart failure.
In a paper published last January, Mayo’s research team found that the algorithm was able to accurately identify a weak heart pump in 86% of patients who were found to have the condition based on a follow-up echocardiogram. The result raised the possibility that the algorithm could improve detection by efficiently screening patients who should be referred for echocardiograms, a costly imaging test.
In the ongoing trial, one group of primary care doctors has been given access to the algorithm through the electronic medical record, while a control group is using traditional methods, such as having the EKGs read by cardiologists to identify patients in need of follow-up. The hope is that the doctors with access to the algorithm will use it to suss out patients with a weak heart pump who don’t know they have it — and order an echocardiogram to confirm.
But in the early days of the trial, a question repeatedly arose about how much patients would be charged for the echocardiograms, which can cost hundreds of dollars with insurance, and significantly more than that without it.
“We didn’t have a good way of explaining that early on,” said Dr. David Rushlow, the chair of family medicine at Mayo. “Our docs were kind of left holding the bag on not knowing whether the echo would be covered.”
Physicians participating in the trial were eventually given a billing code to help with coverage determinations. But there is no research budget to fund such testing, so it can still result in significant costs to the patient.
In many circumstances, the algorithm is also flagging patients who are already very sick. That’s because it is being used to examine EKGs ordered for any reason, such as for a patient in palliative care or hospice care, who is complaining of chest pains. Each positive test triggers a notification in the patient’s electronic medical record as well as a direct email notification to the doctor.
“There are quite a few cases where it’s not a surprise and the doc says, ‘Well, yeah, I know that patient has low [ejection fraction],” Rushlow said. Such unnecessary notifications, while not problematic for patients, can add to administrative burdens and alert fatigue.
Rushlow added, however, that the algorithm is successfully identifying patients who don’t know they have a low ejection fraction. So far, it has been used to examine about 14,000 EKGs taken by primary care physicians. The data have not been fully collected or reviewed, so it is unclear how often it is picking up on patients who have not been previously diagnosed, and whether it’s performing better than patients whose doctors can choose to have a specialist read the EKG.
Dr. Peter Noseworthy, an electrophysiologist who is part of the research team running the trial, said mathematical analysis of the results will drive the final assessment of the algorithm. Researchers are also interviewing doctors and patients involved in the study about how the algorithm affected their thinking.
From the trial’s inception, he said, Mayo’s cardiologists were focused on a particular kind of patient: a stoic person who doesn’t often go to the doctor who maybe has shortness of breath but hasn’t complained about it. The data suggest a lot of those patients are out there, and the algorithm, at least on paper, seems adept at finding them.
“But we’re definitely seeing the reality is much messier than we anticipated,” Noseworthy said. “It’s easy to develop the algorithm and prove something in a single database, but to be able to apply it in [clinics] and try to get real results is a whole different challenge. That’s what this study is about.”
This is the first of a yearlong series of articles exploring the use of artificial intelligence in health care that is partly funded by a grant from the Commonwealth Fund.