SAN FRANCISCO — The discovery that fingertip oxygen-measuring devices might contribute to health disparities because they appear to work less well on patients with darker skin has roiled the world of pulse oximetry, a $2 billion industry that now faces stricter regulations and pressure to address bias in the development and testing of its devices.
In the search for solutions, regulators from the Food and Drug Administration have turned to a single small lab in San Francisco whose visionary founder helped develop modern blood monitoring tools. For decades, the Hypoxia Lab at the University of California, San Francisco, has quietly worked to assess and improve the precision of this low-cost device that revolutionized health care by allowing fast, cheap, and non-invasive monitoring of blood oxygen levels. These instruments are critical for many aspects of medical care, from the treatment of Covid and pneumonia to neonatal monitoring, and the lab tests more than 60 each year for manufacturers and others in a small room packed with monitors, oxygen tanks, ultrasound machines, breathing tubes, and an operating room gurney.
Founded in 1958 by John Severinghaus, a physicist turned anesthesiologist who’s been described as a “master tinkerer,” the Hypoxia Lab was one of the first to publish analyses questioning the accuracy of pulse oximeters on darker skin. Severinghaus went from designing radar systems in World War II to inventing the world’s first blood gas analyzer, a machine now housed in the Smithsonian. He had a deep interest in understanding how the human body copes with low oxygen; he also studied subjects at a lab some call “the Hypoxia Hilton,” which is still in use at 12,470 feet in California’s White Mountains.
As blood gas monitoring evolved and pulse oximeters became ubiquitous in health care by the late 1980s, Severinghaus and his lab spent time evaluating how well they worked. That led to them publishing papers in the mid-2000s suggesting that pulse oximeters were less accurate in patients with darker skin. Historical photos show Severinghaus testing the devices on Black patients decades ago, a time when clinical research subjects were predominantly white. The finding was something that nagged at him.
“He always talked about it. When the devices got popular, he began to wonder how accurate they really were in people with darker skin,” said Philip Bickler, a professor of anesthesia and perioperative care at UCSF who took over running the lab when Severinghaus retired.
It’s been frustrating to Bickler, who was first author on a 2005 paper assessing the effect of skin tone on pulse oximeter readings, that it took a horrifying pandemic where pulse oximeters became critical in determining who received hospitalization and treatment, and numerous new studies, to raise widespread interest in the issue.
“All the while, we were saying, ‘Yes, this is what we were trying to tell you.’ But it just wasn’t on people’s radar as a concern,” Bickler said. “There was no attention to health equity then.”
Now, with new attention focused on health equity and the devices, the lab’s profile has risen markedly. This month, its leaders allowed STAT to spend a day observing detailed testing procedures as they worked to determine how much of a factor skin pigment plays in the accuracy of the devices — a crucial unknown as regulators seek to understand how much those errors may affect treatment decisions. As the lab grappled with a host of issues — from how to assess skin tone to unpredictable variations in readings among different human subjects — one thing became immediately clear: Nothing about testing this simple device is simple.
It was time to draw blood. Diamond Luong, a volunteer 23-year-old research coordinator at UCSF, sat upright in the gurney, her arm numbed with lidocaine, as Bickler gently guided a catheter into her radial artery while watching an ultrasound screen. A pulse oximeter was placed on each of her fingers and a breathing tube was inserted in her mouth. Her nose was pinched shut.
For the next 20 minutes, lab workers compared the oxygen readings on the devices with levels in the blood drawn intermittently from her arm and analyzed on the spot as she was “desaturated,” or given less and less oxygen to breathe. Blood gas measurements taken from blood are considered the gold standard.
Six researchers scurried about the small, one-room testing space as Taylor Swift’s music softly played. Luong was breathing rapidly and deeply. “Satting 100,” Bickler called out as Caroline Hughes, a clinical
research coordinator, drew a sample of bright red blood and popped it into one of two blood gas analyzers that spit out results in seconds. While the lab needs precise numbers, the deoxygenation of blood was visible to the eye: Samples of Luong’s less oxygenated blood were a much darker, cranberry red.
Readings from the 10 pulse oximeters on Luong’s fingers flickered on a huge screen in the corner of the lab, which has multiple cameras so other researchers or manufacturers can Zoom in to watch proceedings remotely.
Luong’s oxygen levels were twice taken down to 70%, far below the normal range but just briefly. She said she was in no discomfort. Volunteering for the research is popular; a short session pays about $200. The lab has long relied on volunteers that come mainly from within UCSF. Michael Lipnick, an associate professor of anesthesiology at UCSF and Hypoxia Lab investigator, said the lab is interested in recruiting a more diverse population, including people with darker skin, but wants to think more deeply about the ethical issues involved in recruiting community research participants.
Luong is Asian with a medium skin tone. Other participants on the day STAT observed had darker skin. But the person on whom pulse oximeters performed the worst was one of the lightest-skinned volunteers tested. And in many cases, the pulse oximeter readings were lower than measured in blood, while the concern in clinical studies has been that the devices showed erroneously higher oxygen levels in people with darker skin, meaning clinicians might miss dangerous hypoxemia, or low oxygen. These results show that the issue of how much skin pigment affects pulse oximeters is not as clear-cut as many believe.
The individual on whom the devices worked poorly (some readings were up to 10% off) was healthy, but had low perfusion, or blood circulation, in her fingers, which may have contributed to the inaccuracies. Perfusion can be affected by a wide range of issues, from illness to something as simple as how warm a subject’s hands are. Some tests of the devices start by warming a subject’s hands, which may be one reason they get better results, said Lipnick.
In the real world, patients may have cold hands, may be sick, may move around too much to get a good reading, or may have small fingers that don’t fit well in the devices. This variability between patients, even between different fingers of the same person, is something the lab is contending with. “Is it skin color? Is it perfusion? Is it blood pressure?” asked Lipnick. “Physiology, especially when it comes to oxygen, is so dynamic.”
The lab’s work escalated in the early days of the Covid-19 pandemic, when many nongovernmental organizations and philanthropists wanted to donate pulse oximeters to under-resourced nations. Global health is a major interest of Lipnick, who works part of the year in Uganda and is associate director of UCSF’s Center for Health Equity in Surgery and Anesthesia. The lab’s drawers are full of pulse oximeters awaiting testing, some costing thousands, some costing as little as $10.
Many of the devices didn’t work well in the lab’s tests, but it was unclear whether that information was getting to donors or the recipients of the devices. To help, Lipnick recently created openoximetry.org, a project to test numerous devices, both hospital-grade and cheaper models consumers can purchase for home use, and post performance data online.
The lab also tests devices for manufacturers; demand has been high in recent years with the growth of health monitoring devices and fitness trackers. Such studies cost about $40,000 to run and are critical for new devices seeking FDA approval. These tests are often shrouded in secrecy because they involve new technologies; employees of the device manufacturers sometimes sweep the lab for security risks and make lab researchers sign non-disclosure agreements.
Business is good. The lab is booked out for eight months, said Deleree Schornack, a clinical research coordinator who maintains the growing wait list.
Busy as the lab is, it’s gotten even busier of late with the new questions over whether skin pigment affects device accuracy and influences patient care. Several foundations are funding the lab to conduct studies that may help answer questions being raised by the FDA, including how best to measure skin tone in device performance tests. The FDA requires devices seeking regulatory approval be tested on at least 15% of subjects or two “darkly pigmented individuals,” but that’s been problematic because the wording is vague and may be interpreted widely.
Skin color may seem easy to assess, but the lab has found that it’s actually quite difficult. Researchers here have used various color scales used in dermatology and for other tech applications. There’s the six-tone Fitzpatrick scale used to assess sunburn risk, which doesn’t have nearly enough dark colors. There’s the new Monk scale, which has a more equitable range, and the Von Luschan scale with up to 36 tones. Researchers punch holes into these paper scales so they can be held directly against a subject’s skin.
But these paper scales, which are similar to paint chips, make the researchers uneasy for a number of reasons, Lipnick said. For one thing, they’re too subjective. For another, the paper scales are printed — and they can vary from printer to printer or appear different depending on the lighting in a room. Skin tone also changes on different parts of the body and if someone is warm or ill.
So the lab supplements paper scales with spectrophotometers, expensive technology that analyzes the amount of light reflected back to its sensors to assess skin tone, which is influenced largely by melanin but also by other skin and blood pigments such as hemoglobin, carotene, and bilirubin. Lab researchers take readings at several places, from the fingers where oximeters are placed, of course, but also from the nose, both sides of the ears, and the upper arms, which generally see little sunlight and therefore aren’t darkened by tanning.
The devices don’t read out a color or a tone but “use lots of maths,” said Greg Leeb, an Australian anesthesiologist who works in the lab, to generate something called an ITA number, which may be more standardizable between labs. The Hypoxia Lab is working with a range of experts, from sociologists to dermatologists, to determine the most reliable way to assess color; it’s one of the key questions the FDA is trying to nail down.
Because it’s become clear that the devices do not work as well in the chaos of an ER or ICU as they do in ideal lab testing settings, the FDA has commissioned the Hypoxia Lab to study how well the devices work in the real world — on hospitalized patients with a different range of skin tones. Kelvin Moore Jr., a Black second-year UCSF medical student and lab volunteer, helped organize the project after reading reports that pulse oximeters work less well in patients with darker skin like his.
“It really laid heavy on me,” said Moore, who jumped at the chance to join the team. “I was like, ‘Sign me up.’ I believe that people doing the research looking like people they are researching is important and doesn’t happen enough.”
Carolyn Hendrickson, a pulmonologist who directs the medical intensive care unit at Zuckerberg San Francisco General Hospital, is leading the study and has recruited about 90 ICU patients, she said.
One major challenge the study faces is that unlike in the lab, where oxygen levels can be lowered safely in healthy volunteers, any oxygen-level drop in an ICU patient is promptly treated by clinicians, making it challenging to get readings at these lower levels. “Clinical staff respond very quickly,” Hendrickson said. “We have to have research staff close by and available to catch transient and unpredictable episodes so that we can collect data.”
Those in the lab hope their various studies will generate data to strengthen device testing, from how many subjects need to be included and how dark their skin needs to be, to whether the devices need to be tested in hospitals. While it could be years before more precise pulse oximeters hit the market, the team hopes in the meantime that its work better informs the public about current devices’ safety, including adding a possible “black box” warning to pulse oximeters to inform clinicians about any inaccuracies and how to account for them in patient care.
“The FDA is asking for more data and that’s already making a difference,” Bickler said.
Severinghaus died last year at the age of 99, but would be pleased, Bickler said, to know the lab he founded is pursuing questions of racial equity he raised so long ago. “If we can fix this issue,” Bickler said. “It could be a model for health disparities.”
This story has been updated to clarify that foundations are funding some of the lab’s studies on pulse oximeters.
This is part of a series of articles exploring racism in health and medicine that is funded by a grant from the Commonwealth Fund.
Get your daily dose of health and medicine every weekday with STAT’s free newsletter Morning Rounds. Sign up here.
Create a display name to comment
This name will appear with your comment