BETHESDA, Md. — This is the moment of truth for the FDA’s regulation of artificial intelligence in medicine.

That was the unmistakable theme of a two-day meeting here this week that focused on how the agency will keep tabs on the safety and effectiveness of new medical imaging devices that use AI to automate tasks performed by radiologists.

Following a string of approvals, these products are now beginning to filter into hospitals and clinics around the country, posing a test of the agency’s review processes and ability to trace the impact of AI on doctors and patients in real-world settings.

advertisement

“Understanding the risks really does not stop with FDA approval,” said David Kent, a physician and director of predictive analytics at Tufts Medical Center. “Even after these are released, we’re going to have to put a lot of effort into understanding whether they are improving outcomes for our patients.”

Kent was among many doctors and AI developers who offered advice to the FDA during two daylong workshops designed to assess the risks and benefits of AI systems that automate triage and interpretation of medical images, and help guide users in capturing scans of the heart and other organs. These technologies not only seek to improve detection of diseases, but also allow key portions of the work to be done by people with limited training.

advertisement

A failure to place adequate guardrails around such technologies can lead to severe consequences, as they have in other industries. Perhaps the most dramatic example is the case of Boeing’s 737 Max airplane, where investigators are examining what went wrong with flight control software that automatically pushed down the nose of the plane, resulting in crashes that killed 346 people. In the ongoing investigation, both Boeing and the Federal Aviation Administration are now facing questions of whether the software was adequately vetted, and whether pilots were properly trained to use it.

Artificial intelligence will be applied differently in the context of radiology, but the FDA faces similar challenges and consequences. So far, most of the devices it has approved are designed to augment — but not entirely automate — the process of reviewing images and making diagnoses. But it is beginning to give the green light to autonomous products such as IDX-DR, an AI-enabled device that detects diabetic retinopathy using retinal images that can be taken by anyone with a high school education.

A few weeks ago, it approved another product by San Francisco-based Caption Health that uses artificial intelligence to help capture ultrasound images of the heart, also known as an echocardiogram. Such images are typically taken by specialists, but Caption Health’s product can be used by nurses who receive only a couple days of training. 

During a presentation at this week’s FDA meeting, one of the company co-founders, Ha Hong, said the product can help relieve a “severe bottleneck” in heart disease treatment and diagnosis by greatly expanding the pool of users who can obtain high-quality images of the heart. The product uses machine learning to instruct a user on how to ideally position the ultrasound wand, or transducer, to get snapshots needed to assess the heart’s functioning. The diagnosis of the patient is ultimately done by a cardiologist.

The FDA approved the product through its “de novo” pathway for brand new devices after studies showed that nurses using the software were able to capture quality images on an array of different patients. 

Shahram Vaezy, a biomedical engineer in the FDA’s division of radiological health, said the approval establishes a pathway for commercialization of similar devices designed to be used in other clinical circumstances or by different users, such as patients who could eventually take their own ultrasound images in their homes.

The question is what will happen when these AI products, whether designed to acquire images or interpret them, start getting used outside the settings in which they were trained. Will they be reliable? Will the algorithms maintain their accuracy levels? And when paired with humans, will they improve care, or lead to less accurate diagnoses and higher costs?

In a prior generation of AI, certain products approved by the FDA did not deliver their promised benefits. In 1998, the agency approved computer-aided detection (CAD) software for use in breast imaging, and the Centers for Medicare and Medicaid Services increased reimbursement for the use of the technology a few years later. Since then, however, studies have shown that the use of CAD, which increased costs by more than $400 million a year, has not been associated with an improved rate of cancer detection.

“Not only did we find there was no improvement with CAD, but really alarming was that … cancer detection was worse at centers where they were using CAD,” said Constance Lehman, the director of breast imaging at Massachusetts General Hospital who co-authored a study on the technology in 2015 and spoke at this week’s FDA meeting.

She also pointed out, however, that the potential for newer and more powerful AI models should be considered in the context of current human performance, which is widely variable in breast imaging. “You could go to a center where the radiologist who interprets your mammogram has a sensitivity of 40%, missing 60% of all cancers that come through for that individual,” Lehman said. “Or you could go to a center where the radiologist has a very consistent sensitivity of 95%, only missing 5% of cancers.” 

Artificial intelligence could help reduce such variation by giving radiologists more consistent and precise information in assessing the risks facing their patients. 

But seizing that benefit requires careful monitoring to track the impact of AI systems as they are deployed in communities with different patient populations and varying levels of resources and clinical expertise, specialists said. One danger is that once doctors start using AI systems to interpret images, they could begin to lean too heavily on the machines and fail to exercise appropriate oversight. Another is that a lack of diversity in data used to train and validate a product could result in inaccurate readings when they are deployed in certain settings. 

Lehman said those risks could be offset by using existing performance checks in breast imaging and other specialities.

“We do have the benefit of required audits, and I think we have an opportunity to leverage that and really look at what the performance is as centers are integrating AI into their programs,” she said. “We might even consider a higher bar for performance reporting if AI is used autonomously.”  

This is part of a yearlong series of articles exploring the use of artificial intelligence in health care that is partly funded by a grant from the Commonwealth Fund

A roundup of STAT’s top stories of the day in science and medicine

Privacy Policy