A series of studies, starting as a steady drip and quickening to a deluge, has reported the same core finding amid the global spread of Covid-19: Artificial intelligence could analyze chest images to accurately detect the disease in legions of untested patients.
The results promised a ready solution to the shortage of diagnostic testing in the U.S. and some other countries and triggered splashy press releases and a cascade of hopeful headlines. But in recent days, the initial burst of optimism has given way to an intensifying debate over the plausibility of building AI systems during an unprecedented public health emergency.
On one side are AI developers and researchers who argue that training and testing methods can, and should, be modified to fit the contours of the crisis; on the other are skeptics who point to flaws in study designs and the limited number of lung scans available from coronavirus patients to train AI algorithms. They also argue that imaging should be used sparingly during the pandemic because of the risk of spreading the infection through contaminated equipment. For those and other reasons, many radiological societies have recommended against using such scans, whether supported by AI or not, for Covid-19 screening.
As AI makes increasing inroads into health care — it’s already being used by radiologists, for example, to help detect strokes and flag potentially cancerous lesions — the coronavirus outbreak is providing the first opportunity for developers to test the feasibility of creating reliable algorithms on the fly. But it’s not at all clear that a process that typically takes many months or years can be sped up to help the first wave of patients.
“I’ve looked at the work around AI and Covid, and I’m not a big fan,” said Ella Kazerooni, a radiologist at the University of Michigan Medical Center. She questioned whether AI can differentiate between Covid-19 and other lower respiratory illnesses when the scans the systems have been trained on focus so heavily on critically ill patients and don’t represent the nuances they will encounter in real clinical settings. “When you’re just comparing normal and abnormal scans,” and nothing in between, she said, “it’s not very helpful.”
A key promise of the work involving chest images is that these scans could provide an alternative to the tests that are currently used to detect the virus — which are in severely short supply in many places. Those tests involve collecting sputum from a throat or nasal swab and then running a polymerase chain reaction, or PCR, to look for tiny snippets of the genetic code of the virus. But critics argue that it’s not clear that even algorithms trained on vast, diverse, and well-labeled data would be able to plug the gap quickly and reliably enough.
The dozens of researchers and companies that are trying to fill the vacuum say their algorithms and imaging products can give caregivers guidance and help improve treatment of patients. These AI models have a range of goals, including screening for the illness, monitoring which infected patients are likely to get worse, and gauging response to experimental drugs that may increasingly be used to try to treat patients.
The papers that have popped up in leading journals and on preprint servers (that post papers before they are peer-reviewed) describe AI models that have been trained and tested using chest images captured via X-rays or computed tomography (CT) scans. The images from Covid-19 patients are generally captured while they are at the hospital, often as inpatients due to severe symptoms; chest images from patients with different lung conditions are being used as controls to develop the models.
Many of these models rely heavily on images from Chinese patients infected in the early weeks of the outbreak. While those scans comprise the largest dataset to date, there is a question of whether they will apply to patients of different racial backgrounds in Europe and the United States, where social, economic, and environmental factors may influence the presentation of the infection.
The scans generated in Chinese hospitals also skew heavily toward patients with advanced symptoms of infection, which may limit their value in screening patients who have just started feeling ill and don’t know if they have Covid-19. That mismatch causes a problem known as selection bias, in which research conclusions become skewed because the data on which they are based do not adequately reflect the population intended to be analyzed.
Some AI developers assert that the extremely high accuracy levels reported in their initial studies are holding up in the real world as they begin to deploy their products in hospitals. “We have processed hundreds of additional patient cases in China and Russia, and we have started in Italy … and we are seeing similar results and performance,” said Moshe Becker, chief executive of RADLogics, a developer of software for use in radiology whose researchers co-authored a preprint posted last week.
A company spokesman said it has applied for an emergency use authorization from the Food and Drug Administration to use the Covid-19 AI in the United States and is seeking a similar approval in Europe. He said the AI is already being used in 10 hospitals in China and Russia.
In 2018, RADLogics received a warning letter from the FDA that accused the company of improperly marketing AI capabilities that it was not authorized to provide. The company later resolved the complaint by removing the offending language from its website.
Becker said his company began building its Covid-19 model when reports of the infection began to surface several months ago. Its study, co-authored by physicians from Mount Sinai Hospital in New York and the University of Maryland, states that the algorithm is designed to not only detect the coronavirus, but provide a “corona score” to quantify the extent of the infection in patients, a use Becker suggested could become more helpful as the outbreak worsens.
“The scale and patient volume is large and rapidly increasing for coronavirus,” he said. “We see our solution being used primarily to augment caregivers in testing and measuring the progression of the disease in patients that have already been tested positive for COVID-19 and admitted to the hospital.”
That application could help caregivers gauge patients’ response to different therapies, said Kazerooni, the Michigan radiologist. However, she said, it still runs into the problem of potentially spreading the infection to caregivers and other patients through contamination of imaging equipment. There’s also the question of whether imaging is the best way to measure response to drugs that doctors are just beginning to try on patients.
“We haven’t really started down that path yet, and I think most doctors are making decisions based on patients’ symptoms — how well they’re doing, can you get them off the ventilator,” she said. “We don’t know yet if there’s a role for [chest imaging] clinically.”
Since the start of the outbreak, the use of CT imaging has declined significantly, according to Aidoc, an AI imaging company that collected usage data from 300 clients around the globe. The data show an overall decline of 20% across all the company’s sites last week, versus the same week in February. The greatest decline was in Europe, at 39%, with U.S. sites dropping 20%. The company did not cite reasons for the decline, but a probable explanation is that providers are canceling CT exams for most patients to reduce risk of contamination and to focus care on those infected with Covid-19.
Meanwhile, companies that make imaging equipment and algorithms are scrambling to highlight the potential value of their products in responding to the outbreak, even if the needs of caregivers and patients are not yet entirely clear.
Butterfly Network, a maker of a portable ultrasound scanner, said its product is being used to help treat patients in the U.S., Italy, and Spain, and on Friday it received approval from Canadian regulators. The company has created a page on its website to post ultrasound images of Covid-19 patients to help pinpoint the physiological signatures of infection in the lungs.
“We’re literally updating this daily with more information,” said John Martin, a physician and the company’s chief medical officer. “We want to help the world consolidate their efforts, identify best practices, and be an educational portal by which we all can learn.”
Prior to the outbreak, however, ultrasound technology was not widely used by doctors to capture images of the lungs, which means the lack of evidence and knowledge around its use could be a barrier to uptake. Martin said the product’s portability could make it particularly useful, because it can be used on patients in isolation and significantly reduces the problem of cross contamination created by CT scanners and other types of imaging equipment.
“Artificial intelligence tools on top of that will help us differentiate between features of influenza A versus the Covid infection, because their appearances are similar but there are some distinct features artificial intelligence can help us with,” Martin added. “Cultures at the moment take time, and you’d like to make that distinction right away.”
But it remains unclear whether AI can actually differentiate between different kinds of infections based on the available data.
Most of the AI systems featured in recent studies are being trained on data from dozens or hundreds of patients — not the many thousands that would ideally be available to train algorithms on the distinct features of the infection. Larger volumes of data are needed to examine how the disease affects patients of different ages who may also have a range of complicating factors, such as chronic respiratory illnesses or conditions such as diabetes.
Some AI researchers, however, think they can make do with relatively few chest images from Covid-19 patients.
Joseph Paul Cohen, of the University of Montreal, is trying to build a public dataset of chest images that his team and other researchers can use to develop open-source AI tools to, among other tasks, distinguish Covid-19-induced pneumonia from pneumonias caused by other viruses or bacteria. He’s sourcing those images from Covid-19 studies that are published under creative commons licenses.
He estimated that AI models could be built using dozens or a few hundred X-ray images from Covid-19 patients because there is an abundance of existing X-ray images from patients with other lung conditions. He pointed to databases maintained by the National Institutes of Health, Stanford University, Massachusetts Institute of Technology, and a team of researchers in Spain, which collectively contain hundreds of thousands of non-Covid chest images that can be used as controls to help train models.
On the other hand, Cohen said, detecting Covid-19 from models built with CT scans will be harder, because there’s no existing enormous dataset of these images. “We’d be starting from scratch there,” Cohen said.
In recommending against the use of chest X-rays and CT scans to screen patients for Covid-19, the American College of Radiology called out the lack of data and consensus on physiological markers of the infection. “We want to emphasize that knowledge of this new condition is rapidly evolving, and not all of the published and publicly available information is complete or up-to-date,” the organization wrote in boldface on its website. “Generally, the findings on chest imaging in COVID-19 are not specific, and overlap with other infections, including influenza, H1N1, SARS and MERS.”
Other AI researchers recognize the challenges inherent in developing such algorithms for detecting Covid-19, but suggest that these models could be particularly useful in certain limited cases.
For example, hospitals could use the models to analyze the scans of patients whose PCR tests turned up negative for Covid-19 — but which are suspected of being false negatives, said Enhao Gong, CEO of Subtle Medical, a Silicon Valley startup that markets AI software that’s been cleared for use to enhance certain medical scans, with the goal of speeding up processing.
The idea behind using algorithms for certain patients who’ve tested negative, Gong said, would be to “detect any missing cases and trigger an alert.” Even though the algorithm might not be perfect, he said, in these cases “a false alarm is acceptable.”
This is part of a yearlong series of articles exploring the use of artificial intelligence in health care that is partly funded by a grant from the Commonwealth Fund.