I

t was an audacious undertaking, even for one of the most storied American companies: With a single machine, IBM would tackle humanity’s most vexing diseases and revolutionize medicine.

Breathlessly promoting its signature brand — Watson — IBM sought to capture the world’s imagination, and it quickly zeroed in on a high-profile target: cancer.

But three years after IBM began selling Watson to recommend the best cancer treatments to doctors around the world, a STAT investigation has found that the supercomputer isn’t living up to the lofty expectations IBM created for it. It is still struggling with the basic step of learning about different forms of cancer. Only a few dozen hospitals have adopted the system, which is a long way from IBM’s goal of establishing dominance in a multibillion-dollar market. And at foreign hospitals, physicians complained its advice is biased toward American patients and methods of care.

STAT examined Watson for Oncology’s use, marketing, and performance in hospitals across the world, from South Korea to Slovakia to South Florida. Reporters interviewed dozens of doctors, IBM executives, artificial intelligence experts, and others familiar with the system’s underlying technology and rollout.

The interviews suggest that IBM, in its rush to bolster flagging revenue, unleashed a product without fully assessing the challenges of deploying it in hospitals globally. While it has emphatically marketed Watson for cancer care, IBM hasn’t published any scientific papers demonstrating how the technology affects physicians and patients. As a result, its flaws are getting exposed on the front lines of care by doctors and researchers who say that the system, while promising in some respects, remains undeveloped.

“Watson for Oncology is in their toddler stage, and we have to wait and actively engage, hopefully to help them grow healthy,” said Dr. Taewoo Kang, a South Korean cancer specialist who has used the product.

At its heart, Watson for Oncology uses the cloud-based supercomputer to digest massive amounts of data — from doctor’s notes to medical studies to clinical guidelines. But its treatment recommendations are not based on its own insights from these data. Instead, they are based exclusively on training by human overseers, who laboriously feed Watson information about how patients with specific characteristics should be treated.

IBM executives acknowledged Watson for Oncology, which has been in development for nearly six years, is in its infancy. But they said it is improving rapidly, noting that by year’s end, the system will offer guidance about treatment for 12 cancers that account for 80 percent of the world’s cases. They said it’s saving doctors time and ensuring that patients get top-quality care.

“We’re seeing stories come in where patients are saying, ‘It gave me peace of mind,’” Watson Health general manager Deborah DiSanzo said. “That makes us feel extraordinarily good that what we’re doing is going to make a difference for patients and their physicians.”

Newsletters

Sign up for our Daily Recap newsletter

Please enter a valid email address.

But contrary to IBM’s depiction of Watson as a digital prodigy, the supercomputer’s abilities are limited.

Perhaps the most stunning overreach is in the company’s claim that Watson for Oncology, through artificial intelligence, can sift through reams of data to generate new insights and identify, as an IBM sales rep put it, “even new approaches” to cancer care. STAT found that the system doesn’t create new knowledge and is artificially intelligent only in the most rudimentary sense of the term.

While Watson became a household name by winning the TV game show “Jeopardy!”, its programming is akin to a different game-playing machine: the Mechanical Turk, a chess-playing robot of the 1700s, which dazzled audiences but hid a secret — a human operator shielded inside.

Watson on Jeopardy
“Jeopardy!” champions Ken Jennings (left) and Brad Rutter watch Watson beat them to the buzzer to answer a question during a practice round in 2011. Seth Wenig/AP

In the case of Watson for Oncology, those human operators are a couple dozen physicians at a single, though highly respected, U.S. hospital: Memorial Sloan Kettering Cancer Center in New York. Doctors there are empowered to input their own recommendations into Watson, even when the evidence supporting those recommendations is thin.

The actual capabilities of Watson for Oncology are not well-understood by the public, and even by some of the hospitals that use it. It’s taken nearly six years of painstaking work by data engineers and doctors to train Watson in just seven types of cancer, and keep the system updated with the latest knowledge.

“It’s been a struggle to update, I’ll be honest,” said Dr. Mark Kris, Memorial Sloan Kettering’s lead Watson trainer. He noted that treatment guidelines for every metastatic lung cancer patient worldwide recently changed in the course of one week after a research presentation at a cancer conference. “Changing the system of cognitive computing doesn’t turn around on a dime like that,” he said. “You have to put in the literature, you have to put in cases.”

Watson grew out of an effort to transform IBM from an old-guard hardware company to one that operates in the cloud and along the cutting edge of artificial intelligence. Despite its use in an array of industries — from banking to manufacturing — it has failed to end a streak of 21 consecutive quarters of declining revenue at IBM. In the most recent quarter, revenue even slid from the same period last year in IBM’s cognitive solutions division — which is built around Watson and is supposed to be the future of its business.

In response to STAT’s questions, IBM said Watson, in health care and otherwise, remains on an upward trajectory and “is already an important part” of its $20 billion analytics business. Health care is a crucial part of the Watson enterprise. IBM employs 7,000 people in its Watson health division and sees the industry as a $200 billion market over the next several years. Only financial services, at $300 billion, is considered a bigger opportunity by the company.

At stake in the supercomputer’s performance is not just the fortunes of a famed global company. In the world of medicine, Watson is also something of a digital canary — the most visible attempt to use artificial intelligence to identify the best ways to prevent and treat disease. The system’s larger goal, IBM executives say, is to democratize medical knowledge so that every patient, no matter the person’s geography or income level, will be able to access the best care.

But in cancer treatment, the pursuit of that utopian ideal has faltered.

STAT’s investigation focused on Watson for Oncology because that product is the furthest along in clinical care, though Watson sells separate packages to analyze genomic information and match patients to clinical trials. It’s also applying Watson to other tasks, including honing preventive medicine practices and reading medical images.

Doctors’ reliance on Watson for Oncology varies among hospitals. While institutions with fewer specialists lean more heavily on its recommendations, others relegate the system to a background role, like a paralegal whose main skill is researching existing knowledge.

Hospitals pay a per-patient fee for Watson for Oncology and other products enabled by the supercomputer. The amount depends on the number of products a hospital buys, and ranges between $200 and $1,000 per patient, according to DiSanzo. The system sometimes comes with consulting costs and is expensive to link with electronic medical records. At hospitals that don’t link it with their medical records, more time must be spent typing in patient information.

At Jupiter Medical Center in Florida, that task falls to nurse Jean Thompson, who spends about 90 minutes a week feeding data into the machine. Once she has completed that work, she clicks the “Ask Watson” button to get the supercomputer’s advice for treating patients.

On a recent morning, the results for a 73-year-old lung cancer patient were underwhelming: Watson recommended a chemotherapy regimen the oncologists had already flagged.

“It’s fine,” Dr. Sujal Shah, a medical oncologist, said of Watson’s treatment suggestion while discussing the case with colleagues.

He said later that the background information Watson provided, including medical journal articles, was helpful, giving him more confidence that using a specific chemotherapy was a sound idea. But the system did not directly help him make that decision, nor did it tell him anything he didn’t already know.

Jupiter is one of two U.S. hospitals that have adopted Watson for Oncology. The system has generated more business in India and Southeast Asia. Many doctors in those countries said Watson is saving time and helping more patients get quality care. But they also said its accuracy and overall value is limited by differing medical practices and economic circumstances.

Despite IBM’s marketing blitz, with years of high-profile Watson commercials featuring celebrities from Serena Williams to Bob Dylan to Jon Hamm, the company’s executives are not always gushing. In interviews with STAT, they acknowledged the system faces challenges and needs better integration with electronic medical records and more data on real patients to find patterns and suggest cutting-edge treatments.

“The goal as Watson gets smarter is for it to make some of those recommendations in a more automated way, to sort of suggest now may be the time and let us flip the switch” when a promising treatment option emerges, said Dr. Andrew Norden, a former IBM deputy health chief who left the company in early August. “As I describe it, you’re probably getting a sense it’s really hard and nuanced.”

Such nuance is absent from the careful narrative IBM has constructed to sell Watson.

Alex Hogan, Ike Swetlitz/STAT

I

t is by design that there is not one independent, third-party study that examines whether Watson for Oncology can deliver. IBM has not exposed the product to critical review by outside scientists or conducted clinical trials to assess its effectiveness.

While it’s not unheard of for companies to avoid external vetting early on, IBM’s circumstances are unusual because Watson for Oncology is not in development — it has already been deployed around the world.

Yoon Sup Choi, a South Korean venture capitalist and researcher who wrote a book about artificial intelligence in health care, said IBM isn’t required by regulatory agencies to do a clinical trial in South Korea or America before selling the system to hospitals. And given that hospitals are already using the system, a clinical trial would be unlikely to improve business prospects.

“It’s too risky, right?” Choi said. “If the result of the clinical trial is not very good — [if] there’s a marginal clinical benefit from Watson — it’s really bad news to the whole IBM.”

Pilar Ossorio, a professor of law and bioethics at University of Wisconsin Law School, said Watson should be subject to tighter regulation because of its role in treating patients. “As an ethical matter, and as a scientific matter, you should have to prove that there’s safety and efficacy before you can just go do this,” she said.

Norden dismissed the suggestion IBM should have been required to conduct a clinical trial before commercializing Watson, noting that many practices in medicine are widely accepted even though they aren’t supported by a randomized controlled trial.

“Has there ever been a randomized trial of parachutes for paratroopers?” Norden asked. “And the answer is, of course not, because there is a very strong intuitive value proposition. … So I believe that bringing the best information to bear on medical decision making is a no-brainer.”

IBM said in its statement that it has collaborated with the research community and presented data on Watson at industry gatherings and in peer-reviewed journals. Some doctors said they didn’t need to see more research to know that the system is valuable. “Artificial intelligence will be adopted in all medical fields in the future,” said Dr. Uhn Lee, who runs the Watson program at Gachon University Gil Medical Center in South Korea. “If that trend, that change is inevitable, then why don’t we just start early?”

So far, the only studies about Watson for Oncology are conference abstracts. The full results haven’t been published in peer-reviewed journals — and every study, save one, was either conducted by a paying customer or included IBM staff on the author list, or both. Most trumpet positive results, showing that Watson saves doctors time and has a high concordance rate with their treatment recommendations.

The “concordance” studies comprise the vast majority of the public research on Watson for Oncology. Doctors will ask Watson for its advice for treating a slew of patients, and then compare its recommendations to those of oncologists. In an unpublished study from Denmark, the rate of agreement was about 33 percent — so the hospital decided not to buy the system. In other countries, the rate can be as high as 96 percent for some cancers. But showing that Watson agrees with the doctors proves only that it is competent in applying existing methods of care, not that it can improve them.

IBM executives said they are pursuing studies to examine the impact on doctors and patients, although none has been completed to date.

Questions about Watson have begun spilling into public view, including in a recent Gizmodo story headlined “Why Everyone is Hating on IBM Watson — Including the People Who Helped Make It.” The most prominent failure occurred last February when MD Anderson Cancer Center, part of the University of Texas, cancelled its partnership with Watson.

The MD Anderson alliance was essentially the early face of Watson in health care. The Houston hospital was among IBM’s first partners, and it was using the system to create its own expert oncology adviser, similar to the one IBM was developing with Memorial Sloan Kettering. But the project disintegrated amid internal allegations of overspending, delays, and mismanagement. In all, MD Anderson spent more than three years and $60 million — much of it on outside consultants — before shelving the effort.

The hospital declined to answer questions. But the project leader, Dr. Lynda Chin, in her first media interview on the subject, told STAT about the challenges she faced. Chin left MD Anderson before the project collapsed; a subsequent audit flagged several violations of procurement rules under her leadership.

Chin said that Watson is a powerful technology, but that it is exceedingly difficult to make functional in health care. She and her team encountered numerous roadblocks, some of which still have not been fully addressed by IBM — at MD Anderson or elsewhere.

The cancer hospital’s first major challenge involved getting the machine to deal with the idiosyncrasies of medical records: the acronyms, human errors, shorthand phrases, and different styles of writing. “Teaching a machine to read a record is a lot harder than anyone thought,” she said. Her team spent countless hours on that problem, trying to get Watson to extract valuable information from medical records so that it could apply them to its recommendations.

Chin said her team also wrestled with deploying the system in clinical practice. Watson, even if guided by doctors, is as close as medicine has ever gotten to allowing a machine to help decide the treatments delivered to human beings. That carries with it thorny questions, such as how to test the safety of a digital treatment adviser, how to ensure its compliance with regulations, and how to incorporate it into the daily work of doctors and nurses.

“Importantly,” Chin said. “How do we create an environment that can ensure the most important tenet in medicine: Do no harm?”

Finally, the project ran into a bigger obstacle: Even if you can get Watson to understand patient variables and make competent treatment recommendations, how do you get it access to enough patient data, from enough different sources, to derive insights that could significantly advance the standard of care?

Chin said that was a showstopper. Watson did not have a connected network of institutions feeding data about specific cohorts of patients. “You may have 10,000 patients for lung cancer. That is still not a very big number when you think about it,” she said.

With data from many more patients, Chin said, you could see patterns — “subsets [of patients] that respond a certain way, subsets that don’t, subsets that have a certain toxicity. That pattern would help with better personalized and precision medicine. But we can’t get there without the ability to actually have a way of aggregating them.”

IBM told STAT that Chin’s work was separate from the effort to create Watson for Oncology, which was validated by cancer specialists at Memorial Sloan Kettering prior to its deployment. The company said that Watson for Oncology can extract and summarize substantial text from patient records, though the information must be verified by a clinician, and that it has made significant progress in obtaining more data to improve Watson’s performance. It pointed to partnerships with the health care publisher Elsevier and the analytics firm Doctor Evidence.

To date, more than 50 hospitals on five continents have agreements with IBM, or intermediary technology companies, to use Watson for Oncology to treat patients, and others are using the genomics and clinical trials products.

But the partnership with Memorial Sloan Kettering, and the product that grew out of it, resulted in complications that IBM has papered over with carefully parsed statements and misleading marketing.

Watson Korean hospital
Tae-hyun Cho (right), the first Korean to be treated with assistance from Watson for Oncology, reviews his medical information with oncologists at Gachon University Gil Medical Center. Gachon University Gil Medical Center

I

n its press releases, IBM celebrates Memorial Sloan Kettering’s role as the only trainer of Watson. After all, who better to educate the system than doctors at one of the world’s most renowned cancer hospitals?

But several doctors said Memorial Sloan Kettering’s training injects bias into the system, because the treatment recommendations it puts into Watson don’t always comport with the practices of doctors elsewhere in the world.

Given the same clinical scenario, doctors can — and often do — disagree about the best course of action, whether to recommend surgery or chemotherapy, or another treatment. Those discrepancies are especially wide for second- and third-line treatments given after an initial therapy fails, where evidence of benefits is slimmer and consensus more elusive.

Rather than acknowledge this dilemma, IBM executives, in marketing materials and interviews, have sought to downplay it. In an interview with STAT, DiSanzo, the head of Watson Health, rejected the idea that Memorial Sloan Kettering’s involvement creates any bias at all.

“The bias is taken out by the sheer amount of data we have,” she said, referring to patient cases and millions of articles and studies fed into Watson.

But that mischaracterizes how Watson for Oncology works. (IBM later claimed that DiSanzo was referring to Watson in general.)

The system is essentially Memorial Sloan Kettering in a portable box. Its treatment recommendations are based entirely on the training provided by doctors, who determine what information Watson needs to devise its guidance as well as what those recommendations should be.

When users ask Watson for advice, the system also searches published literature — some of which is curated by Memorial Sloan Kettering — to provide relevant studies and background information to support its recommendation. But the recommendation itself is derived from the training provided by the hospital’s doctors, not the outside literature.

Doctors at Memorial Sloan Kettering acknowledged their influence on Watson. “We are not at all hesitant about inserting our bias, because I think our bias is based on the next best thing to prospective randomized trials, which is having a vast amount of experience,” said Dr. Andrew Seidman, one of the hospital’s lead trainers of Watson. “So it’s a very unapologetic bias.”

Seidman said the hospital is careful to keep its training grounded in clinical evidence when the evidence exists, but it is not shy about giving its recommendations when it doesn’t. “We want cancer care to be democratized,” he said.  “We don’t want doctors who don’t have the thousands and thousands of patients’ experience on a more rare cancer to be handicapped. We want to share that knowledge base.”

At a recent training session of Watson on Manhattan’s Upper East Side, the tensions involved in programming the system were on full display. STAT sat in as Memorial Sloan Kettering doctors, led by Seidman, gathered with IBM engineers to train Watson to treat bladder cancer. Five IBM engineers sat on one side of the table. Across from them were three oncologists — one specializing in surgery, another in radiation, and a third in chemotherapy and targeted medicines.

Several minutes into the discussion, the question arose of which treatment to recommend for patients whose cancers persisted through six rounds of chemotherapy. The options in such cases tend to be as slim as the evidence supporting them. Should Watson recommend a radical surgery to remove the bladder? Dr. Tim Donahue, the surgical oncologist, noted that such surgery seldom cures patients and is not associated with improved survival in his experience.

Then what about another course of chemotherapy combined with radiation?

When Watson gives its recommendations, it puts the top recommendation in green, alternative options in orange, and not recommended options in red.

But in some clinical scenarios, it’s difficult to tell the colors apart.

“This is the hard part of this whole game,” Dr. Marisa Kollmeier, the radiation oncologist, said during the training. “There’s a lack of evidence. And you don’t know if something should be in green without evidence. We don’t have a randomized trial to support every decision.”

But the task in front of them required the doctors to press ahead. And they did, rifling through an array of clinical scenarios. In some cases, a large body of evidence backed up their answers. But many others fell into a gray area or were clouded by the inevitable uncertainty of patient preferences.

The meeting was one of many in a months-long process to bring Watson up to speed in bladder cancer. Subsequent sessions would involve feeding it data on real patient cases at Memorial Sloan Kettering, so doctors could reinforce Watson’s training with repetition.

That training does not teach Watson to base its recommendations on the outcomes of these patients, whether they lived, or died or survived longer than similar patients. Rather, Watson makes its recommendations based on the treatment preferences of Memorial Sloan Kettering physicians.

At some institutions using Watson, IBM’s lack of clarity on the cancer center’s role causes confusion. Some seem to think they are getting advice from doctors around the world.

“As we tell the patients, it’s like another consultation, but it’s a worldwide consultation,” said Dr. K. Adam Lee, medical director of thoracic oncology at Jupiter Medical Center, when STAT visited in June.

“Really worldwide,” added Kerri Ward, an oncology nurse at the hospital. “It pulls from 300 journals, just for oncology, the clinical database, so the national clinical database, journals, textbooks, and then Sloan Kettering is the one that’s feeding in the clinical [information] currently.”

Robert Garrett, the CEO of Hackensack Meridian Health, a group in New Jersey that is using a version of Watson for Oncology, said the information in Watson is “global.”

“If you’re a patient that has colon cancer, they have in their database, as I understand it, how colon cancer is treated around the world, by different clinicians, what’s been the most effective treatment for different phases of colon cancer,” Garrett said. “That’s what IBM Watson brings to the table.”

None of that accurately depicts how Watson for Oncology works.

S

everal doctors who have examined Watson in other countries told STAT that Memorial Sloan Kettering’s role has given them pause. Researchers in Denmark and the Netherlands said hospitals in their countries have not signed on with Watson because it is too focused on the preferences of a few American doctors.

Martijn van Oijen, an epidemiologist and associate professor at Academic Medical Center in the Netherlands, said Memorial Sloan Kettering is packed with top specialists but doesn’t have a monopoly on cancer expertise. “The bad thing is, it’s a U.S.-based hospital with a different approach than some other hospitals in the world,” said van Oijen, who’s involved in a national initiative to evaluate technologies like Watson and is a strong believer in using artificial intelligence to help cancer doctors.

In Denmark, oncologists at one hospital said they have dropped the project altogether after finding that local doctors agreed with Watson in only about 33 percent of cases.

“We had a discussion with [IBM] that they had a very limited view on the international literature, basically, putting too much stress on American studies, and too little stress on big, international, European, and other-part-of-the-world studies,” said Dr. Leif Jensen, who directs the center at Rigshospitalet in Copenhagen that contains the oncology department.

In countries where doctors were trained in the United States, or they use similar treatment guidelines as the Memorial Sloan Kettering doctors, Watson for Oncology can be helpful. Taiwan uses the same guidelines as Americans, so Watson’s advice will be useful there, said Dr. Jeng-Fong Chiou, vice superintendent of the Taipei Cancer Center at Taipei Medical University, which started using Watson for Oncology with patients in July.

But he also said there are differences between American and Taiwanese patients — his patients often receive lower doses of drugs to minimize side effects — and that his oncologists will have to make adjustments from Watson’s recommendations.

The generally affluent population treated at Memorial Sloan Kettering doesn’t reflect the diversity of people around the world. The cases used to train Watson therefore don’t take into account the economic and social issues faced by patients in poorer countries, noted Ossorio, the University of Wisconsin law professor.

“What it’s going to be learning is race, gender, and class bias,” she said. “We’re baking those social stratifications in, and we’re making the biases even less apparent and even less easy for people to recognize.”

Sometimes, the recommendations Watson gives diverge sharply from what doctors would say for reasons that have nothing to do with science, such as medical insurance. In a poster presented at the Global Breast Cancer Conference 2017 in South Korea, researchers reported that the treatment Watson most often recommended for breast cancer patients simply wasn’t covered by the national insurance system.

IBM said it has convened an international group of advisers to gather input on Watson’s performance. It also said that the system can be customized to reflect variations in treatment practices, differences in drug availability and financial considerations, and that the company recently introduced tools reduce the time and cost of adapting Watson.

In a response to STAT’s questions, Memorial Sloan Kettering said international journals are part of the literature it provides to Watson, including the Lancet, the European Journal of Cancer, Annals of Oncology, and the BMJ. “As we do in all areas of cancer research, we will continue to observe and study how Watson for Oncology impacts care internationally, follow the evidence, and work with IBM to optimize the system,” the hospital said.

Some hospitals abroad are customizing the system for their patients, adding information about local treatments. Nan Chen, who manages the Watson for Oncology program at Bumrungrad International Hospital in Thailand, said his oncologists use Japanese guidelines, not American guidelines, for treating gastric cancer.

But he said doctors can find this localization redundant or unnecessary: They are not that interested in being told the same guidance they just taught Watson.

“Our doctors say, this treatment is our own treatment, we know that,” Chen said. “You don’t need to turn around and put those treatments in Watson, and let Watson tell us what kind of treatment that we are using here in the hospital.”

Chen said this modified system is incredibly beneficial, however — to a hospital in the capital of Mongolia that employs zero oncology specialists.

At UB Songdo Hospital, of which Chen’s company is a majority owner, doctors are following Watson’s suggestions nearly 100 percent of the time. Patients who otherwise would have been treated by generalists with little, if any, cancer training are now benefiting from top-level expertise.

“That is the kind of thing that IBM is dreaming about,” Chen said.

In South Korea, Dr. Taewoo Kang, a surgical oncologist at Pusan National University Hospital who specializes in breast cancer, pointed to another important problem that Watson needs to solve. Right now, it provides supporting evidence for the recommendations it makes, but doesn’t actually explain how it came to recommend that particular treatment for that particular patient.

Kang said that, sometimes, he will ask Watson for advice on a patient whose cancer has not spread to the lymph nodes, and Watson will recommend a type of chemotherapy drug called a taxane. But, he said, that therapy is normally used only if the cancer has spread to the lymph nodes. And, to support the recommendation, Watson will show a study demonstrating the effectiveness of the taxane for patients whose cancer did spread to their lymph nodes.

Kang is left confused as to why Watson recommended a drug that he does not normally use for patients like the one in front of him. And Watson can’t tell him why.

WATSON at ASCO
Louisa Roberts (left) of IBM Watson Health speaks with Merck executive Oliver Maschinsky in the Watson booth at the 2017 ASCO cancer conference in Chicago. Heather Stone for STAT

F

or all the concerns, some doctors around the world who use Watson insist that artificial intelligence will one day revolutionize health care. They say that clinicians are realizing concrete benefits — saving doctors valuable time searching for studies, better educating patients, and undercutting hierarchies in the clinic that might interfere with evidence-based treatment.

In Taiwan, Chiou said Watson immediately provides the “best data” from the literature about a treatment — survival rates, for example — relieving doctors of the task of searching the literature to compare each possible treatment.

Watson’s information also empowers patients, said Lee, the doctor who runs the Watson program at Gil Medical Center in South Korea. Previously, doctors verbally explained different treatment options to patients. Now, physicians can give patients a comprehensive packet prepared by Watson, which includes potential treatment plans along with relevant scientific articles. Patients can do their own research about these treatments, and maybe even disagree with the doctor about the right course of action.

“This is one of the most important and significant changes,” Lee said.

Watson also holds senior doctors accountable to the data. At Gil Medical Center, patients sit in a room with five doctors and Watson itself, the interface displayed on a flat-screen television in the so-called “Watson center.” Lee said that Watson’s presence has a huge influence on the doctors’ decision-making process, leveling the hierarchy that traditionally prioritized the opinion of the senior doctor over junior colleagues.

Watson gives the junior physicians quick and easy access to data that might prove their elders wrong, displaying on the screen information such as the survival rate right alongside a recommended treatment. It would be humiliating for senior doctors to continue to push for a different treatment in light of this evidence, Lee said.

At Manipal Hospitals in India, Dr. S.P. Somashekhar said that while there are some regional disparities in Watson’s recommendations for patients with rectal and breast cancer, those cases are outliers: For the vast majority of patients, the program matched the recommendations given to patients by the hospital’s tumor board — a group of 20 physicians that typically study their cases for a week and spend an hour discussing them.

That means that in a handful of seconds, Watson did what it takes 20 doctors over a week to accomplish. “That is so precious and very highly valuable,” Somashekhar said. “Our physicians cannot discuss every case. For every case we discuss in the tumor board, there are five cases which we cannot discuss.”

While those benefits are significant, they fall short of breakthrough discoveries that could predict or eradicate disease.

IBM executives said that doesn’t mean Watson can’t accomplish those feats. Norden, the former deputy health officer for Watson for Oncology and Genomics, said the goal is to ultimately bring together streams of clinical trial data and real-world patient data, so that Watson could begin to pinpoint the best treatments on its own.

“My own belief is that over time we will be better at measuring and reporting outcomes, and that data will be increasingly influential,” he said. “Where cancer care is today, I don’t think that any computing system is ready to be let out into the world without a measure of expert human oversight.”

IMMERSION IN 360: Drag the video to look around the room. At IBM Watson Health’s Cambridge, Mass., headquarters, prospective customers can view a demonstration of Watson products inside an “immersion room.” The computers at the back of the room are “a famous selfie spot,” an IBM employee told STAT during a tour. Dom Smith/STAT

T

he bigger question for IBM is not whether health care will see a revolution in artificial intelligence but who will drive it.

One former IBM employee says the company could become a victim of its own marketing success — the unrealistic expectations it set are obscuring real accomplishments.

“IBM ought to quit trying to cure cancer,” said Peter Greulich, a former IBM brand manager who has written several books about IBM’s history and modern challenges. “They turned the marketing engine loose without controlling how to build and construct a product.”

Greulich said IBM needs to invest more money in Watson and hire more people to make it successful. In the 1960s, he said, IBM spent about 11.5 times its annual earnings to develop its mainframe computer, a line of business that still accounts for much of its profitability today.

If it were to make an equivalent investment in Watson, it would need to spend $137 billion. “The only thing it’s spent that much money on is stock buybacks,” Greulich said.

IBM said it created the market for artificial intelligence and is pleased with the pace of Watson’s growth, noting that it and other new business units grew by more than $20 billion in the past three years. “It took Facebook and Amazon more than 13 years to grow $20 billion,” the company said in a statement.

Since Watson’s “Jeopardy!” demonstration in 2011, hundreds of companies have begun developing health care products using artificial intelligence. These include countless startups, but IBM also faces stiff competition from industry titans such as Amazon, Microsoft, Google, and the Optum division of UnitedHealth Group.

Google’s DeepMind, for example, recently displayed its own game-playing prowess, using its AlphaGo program to defeat a world champion in Go, a 3,000-year-old Chinese board game.

DeepMind is working with hospitals in London, where it is learning to detect eye disease and speed up the process of targeting treatments for head and neck cancers, although it has run into privacy concerns.

Meanwhile, Amazon has launched a health care lab, where it is exploring opportunities to mine data from electronic health records and potentially build a virtual doctor’s assistant.

A recent report by the financial firm Jefferies said IBM is quickly losing ground to competitors. “IBM appears outgunned in the war for AI talent and will likely see increasing competition,” the firm concluded.

While not specific to Watson’s health care products, the report said potential clients are backing away from the system because of significant consulting costs associated with its implementation. It also noted that Amazon has 10 times the job listings of IBM, which recently didn’t renew a small number of contractors that worked for the company following its acquisition of Truven, a company it bought for $2.6 billion last year to gain access to 100 million patient records.

In its statement, IBM said that the workers’ contracts ended and that it is continuing to hire aggressively in the Cambridge, Mass.-based Watson Health and other units, with more than 5,000 positions open in the U.S.

But the outlook for Watson for Oncology is challenging, say those who have worked closest with it. Kris, the lead trainer at Memorial Sloan Kettering, said the system has the potential to improve care and ensure more patients get expert treatment. But like a medical student, Watson is just learning to perform in the real world.

“Nobody wants to hear this,” Kris said. “All they want to hear is that Watson is the answer. And it always has the right answer, and you get it right away, and it will be cheaper. But like anything else, it’s kind of human.”

Leave a Comment

Please enter your name.
Please enter a comment.

  • As for Watson’s difficulty reading hand written notes, why not use speech recognition software? Rather than manually editing, doctors & nurses place a microphone near their mouths, and all medical information, comments, observations are noted with speech recognition. Then the recorder just downloads the commentary into Watson, saves it & prints out a copy for the patient’s file. That is old technology, but a medical person should be there to verify the veracity of the text.

    • Oh if it were just that simple. Today there are different camps of concern. Some of us have shifted to no paper at all. We use fully digital input so no need to transcribe written notes BUT there are practices who are not fully electronic and then there are those who dictate but the software is at best terrible.
      I started to use Dragon Dictate Medical about 20 years ago and hated every moment of it since every 5th word was wrong. Sure there are ways to overcome the tech issues but what about colloquialisms and the like? Us folks down South use different language apparently then the folks up North and what about trying to communicate with England or any of the EU who don’t use English as their primary language
      Then there are the plethora of shorthands like BTW and BRB and PITA and the like. This is NOT an issue of data shortage because of technology it is an issue of data shortage thru design. IBM right now is only accepting data after MSK determines it’s worth
      Some of us didn’t go to residency at MSK and prefer to not use their approach to care so according to Watson our opinion isn’t valid
      Dr D

  • Here is the real problem with IBM Watson. Through the span of the engagement, a lot of human resources are used to input data, analyze data and even correct incorrect or ambiguous data. If IBM Watson can mimic human intelligence, the human resources should taper off at some stage which does not happen here. As such. IBM Watson is a consulting service rather than a groundbreaking software product. As someone has mentioned in the comments, humans can work in a space where the context and the problem space is not clearly defined, yet can automatically sense the context, define a problem area scope and carry out unconscious search for existing solutions in the space. If it does not find the answers in the chosen space, it can redefine the boundaries, use discovery to initiate search for known unknowns and unknown unknowns. Eventually, they may be able to find the correct solution to the problem. The current AI system with Bayesian Search, Analogical Reasoning, Evolutionary Functions, Connectome Functions and Semantic Demodulators cannot do all of the above automatically. The Watson engineers who are no doubt a fine bunch of passionate technologists are probably fighting against the revenue obsessed marketing and sales divisions who do not understand that R&D does not work very under external pressure. The sooner IBM Watson understands its mistakes and corrects it, the better. Otherwise, all future products of Watson will be perceived with doubts about its abilities.

  • Great article – that raises many questions. How does IBM Watson guard against the bias of the individual doctors advising Watson? I quickly looked up Dr Andrew Seidman described as a lead trainer of Watson in Open Payments. In 2016 Dr Seidman received $432,359 from Novartis for research on Tykerb? I’m not saying it’s a bad thing to have lead investigators as advisers but what are the checks and balances?

    • Yes I wonder how much of the delays with this are arguments about filtering out those of Watson’s suggestions which may be financially unproductive for providers.

  • I have seen the process for training Watson . It is a very rudimentary technique of feeding in question answer pairs from a simple form . The more pairs you feed the more is the confidence level of the responses a person gets when he queries Watson . This kind of supervised learning with a small set of data will never scale up . It has to be unsupervised learning with information flowing seamlessly from all possible sources in the world . The earlier IBM realizes this they will survive else it is on its way to becoming another Nokia .

  • So can the queries be treated like database searches? and a print out of the reason for the last choice?

    As for the population and culture, are datapoints set up for that particular region or country when the program is set up? I would think the government or WHO would have stats on that.

    Are preferences of the patient/doctor set up or asked? such as homeopathic means or non-invasive

    As for the reliability of research, I would think that could be pre-set by the doctor or hospital based on their standards of what they think are reputable publications and research organizations. With physician organizations I would think that would be fairly known with hospitals or doctors themselves. A margin of how conservative or risky could be set from that median.

    This however is with little to no information for how this works. I would think if humans need teachers to help learn that AI would be similar.

  • Excellent report! IBM naively expected that they could simulate human intelligence using NLP, Bayesian and text mining algorithms. They should have started with well defined problem with limited scope.

  • With regards to how IBM or any other AI systems will help shape the future of healthcare, one only needs to see how our current communications technologies have changed how we shop, invest, entertain, learn, and reach out to each other. Does anyone remember the Qwest Communications’ TV commercial in the late 1999, where the weary traveler asked the front desk of a motel for the entertainment options, she answered, “all rooms have every movie ever made in any language anytime, day or night.” Qwest was clearly early in making such claims, but nearly 20 years later, no one thinks it is an impossible to accomplish. Watson may be early in making certain claims, but there is clear indication that’s where the future is headed. The current version of Watson may be compared to the doctor still in the middle of the oncology fellowship training. If an oncology fellow can master all the expertise of that particular program while in training, the fellow would be lauded as a star with a bright future in an academic or clinical career. On the other hand, would the fellow risk her reputation promoting the ideas of a competing program across town? Unlikely, unless she wishes to bring the wrath of the program director upon her and risk not getting a good letter of recommendation. Would the fellow try a “creative” treatment protocol on a cancer with well established therapies and well documented results? I would hope not. So the tendency would be to play it safe. Follow the literature. Follow the prior experience and the cumulative knowledge and (hopefully) wisdom of experts. This would answer why Watson is not (yet) revolutionary. And it shouldn’t be. It needs to stay within the clinical guidelines set by the experts in the field. It needs to play it safe. That explains why experts would say that Watson does not seem to add much. And it should be that way, at least at this early stage of development. Yet, one should also consider that Watson has already learned (and will continue to learn) the major guidelines, at least in English, probably better than most doctors. And it will likely improve at a much faster rate than most doctors, and at some point, its insights would be very close to that of the very best oncology experts for every type of cancer (most oncologist sub-specialize in a narrow range of cancers). Having such an expert at ready disposal at every major hospital will be a huge benefits to the patients. Imaging a cancer patient trying to get a second opinion from a world expert on testicular cancer. He would probably wait several weeks and travel hundreds if not thousands of miles for an appointment, while constantly worrying about the disease advancing. He would likely need to pay several hundreds of dollars (thousands more if additional testing is required) out of pocket upfront as the expert is probably not in the network of the patient’s insurance plan. He may then get an opinion that may or may not agree with the local expert’s recommendation. Now, compare that scenario with this: The patient goes to the local testicular cancer expert, have the data entered into Watson, and have a probabilistic recommendation that would match closely that of the world expert. The patient would likely become more assured of the adequacy of the recommendation of the local expert, or at least have a basis from which to discuss the options in case there is a variance of recommendations. All while staying at home every night and saving lots of money.
    We should avoid projecting our (currently) unrealistic expectations on Watson’s abilities then criticize it for not having achieved it yet.
    Watson is being promoted as a tool to help physicians make treatment decisions, not to assume the role of the primary decision maker. The best part is that it will get better, to the benefit of the patients, provided that IBM continues to support it financially, technologically, and intellectually, and the medical community learns to accept it and train it, like we train the best of our medical students.

    • Don’t understand your analogy. Providing movies to a hotel room is trivial compared to treating cancer and yet the range of options twenty years after the Qwest commercial is still pathetic.

  • pharmvet1 I am not saying I like the shift I am saying it is inevitable
    Also be clear not every patient will be treatable by their PCP but a large share will be and with that a shift away from specialty care. I spent 3 years with the “DC Gang” (Senate special advisor to healthcare management) trying to unravel the ACA nightmare and what I got from the whole medical future is that to provide care to everyone in a society that doesn’t embrace prevention (like ours) we have to accept a different process than we are used to. We can’t have Dr. Welby available for everyone at any moment. We need to prepare for the future that includes more and more services provided by auxiliaries and by PCP’s
    I TOTALLY agree; when we shifted from Psych to PCP we simply reduced a layer of expense and lowered the quality of care commensurately. BUT in all reality what choices do we have? NOT everyone can go to a MSK or MD Andersen or for that matter NIH building 10. We are going to have to accept compromised care and HOPE that technology like Watson can catch up quickly to fill in the gaps
    Let’s be realistic if Watson ever does what it is claiming to do then NO reason why a PCP can’t plug info including an episode of aplastic anemia and get a solution popped out and act on it for 70% of the cases and let the other 30% go to the regional Oncology think tanks. NO, I don’t want type of care for me or my family but then I am not in the situation that I EXPECT free care or care paid for by ANY third party as much of America has become so accustomed to
    Had Americans instead of buying iPhones and Wii instead invested in mandated HSAs and forced saving for future medical care then we wouldn’t be looking at this shift in care
    What I can tell you is that one of the first tiers to go is the corner Pharmacists. The system is already gearing up to eliminate that in favor of individual dosed prescriptions dispensed by “techs” as directed by the physicians mandate thru an EHR system PharmaCos will simply pre package every drug in individual strips and then a tech will rip the strips off like stamps at the old USPS and the patient will leave done and over with. A centralized Pharmacist will be available via video teleconference to oversee the tech and to answer any questions for the patient. The suggested savings is in the BILLIONS. I again don’t agree but the model is being built and tested as we speak and I am betting it rolls out in 10 years MAX. Think about all the $150+K salaries that will be saved when they only need to pay a tech $30K to rip and bag :(. Pharmacists will get MUCH bigger positions in clinics and hospitals but the local corner CVS guys are looking for early retirement
    Dr D

    • Dr Dave, your thesis may be inevitable, but where will all of these newly minted PCP’s come from? The medical school where my dad went (Tulane) and the med school where I went to grad school (Vanderbilt) now cost between $350,000 and $400,000 for the fully loaded four year attendance. Many of these graduates would like to go into primary care but they would be in debt for the rest of their lives. No, the PCP’s are going to come out of the large state universities if they are to increase their numbers, which are actually static or declining, even if they do have the ability to use supercomputer assisted diagnostics, unless of course you wish to expand these capabilities to oncology NP’s which is where I would draw the line.

    • Well, we can draw any line WE want but in the reality of things and as time moves on what was unacceptable a decade or 3 ago is now common practice. When I started many moons ago a surgeon opened and closed his own cases. Now we have certified OR techs and or PAs or NP’s who close up and dress all work we do. Sure a patient can complain they want to see the Oncologist but then they will need to belly up cash since eventually, insurance will pay different amounts for different levels and since THEY determine that a well certified PA is OK to treat lung cancer with the help of a Watson type product and the copay is zero but IF you opt to go to the Oncologist then the copay is $800 per visit I bet MOST people will not be drawing the same line.
      There is no shortage of PCPs as the news would like us to believe sure there is a shortage of everything in the mountains of Wyoming or Nebraska but in most of the USA there is no PCP shortage and for those who do practice they are filled quite nicely and would LOVE to take on specialty work if it meant better control of how their patients were treated and better supervision of therapy from cradle to grave and a wee bit extra cash.
      Take out the liability side of things and WAY more folks will step from pure PCP into PCP guided specialty services. I see it daily with Derm and minor plastics to even minor lacerations and biopsies so the issue is liability and comfort both COULD be solved if someone makes a truly AI dutiful software that info can be plugged into and results graded as to viability are returned.
      Dr D

  • What is being missed here is the BIG picture. Watson is not being built for 2017 or even 2020 it is being built when we shift our entire medical system and focus on primary care and specialty care will be limited to regional facilities and for difficult cases. Watson will allow a PCP to treat basic cancer patients with the ease and experience of a hybrid oncology system
    Over time the idea of having specialty providers except for procedure ones like surgeons and Radiology and the like is going the way of the dinosaur. We can’t afford to have an Oncologist involved at $400K when we can crank some buttons and have a PCP order the same drugs for the same patient at $170K. YES there will always be refractory patients who need the extra care and time and experience of a specialist but in the majority, a PCP can suffice
    It won’t be politically accepted but it will be economically feasible
    My partner and I were part of the very initial concept of Watson. We both fiddled with it and ran some patients thru it but when it was all done the outcomes were nothing that we already didn’t expect. Paying us to crush the data and then another $1K for IBM makes no sense UNLESS the data that IBM is using is FAR greater than we can digest and so far that is not the case
    When and only when the system goes from “support for specialists” to “crutches for PCP” will this pan out. IBM has deep pockets to develop and wait so don’t sell this dog and pony show short it is a waiting game and I am betting on IBM to win
    The issue of having MSK as the “authority” will crumble over time as they will realize that it is all about massive data, not accurate data. better to have 100K lung cancer patients data to mine for common features than to have 10K MSK “opinions.” Society thinks wrongly that cancer or medicine is a switch it is on or off right or wrong when in fact it is opinions and outcomes all titrated together to get individual outcomes satisfactory to each patient. Some patients refuse surgery some refuse chemo and so forth but more importantly some do well with one OR the other and success is found with them both so suggesting one was wrong is simply crazy. Having only MSK data or opinions will eventually go and once IBM figures out how to insert the data from every EMR system out there than they can mine terabytes a millisecond for weeks on end to find a solution that no one thought about for a particular situation
    Right now mining small data sets is the limitations of the system using only journals and research is insanity we need to mine ALL patients and that is going to take both economics as well as political change. We have to get rid of HIPAA to allow everyone’s data to be included so that we find the magic secret sauce that we are missing in early cancer diagnosis and eventual therapy.
    Yes right now it is a marketing game to keep it funded and alive but don’t count this out it is going to be the future and I am betting NOT so much on IBM but more on Google they have the much bigger picture in focus and they have far more discretionary funds to waste until this comes to ripeness
    Dr D
    H&N Surgical Oncology

    • Dear Dr Dave. In medicine you generally get what you pay for. Good example is shift from board certified psychiatrists to PCP’s in treating mental illness, or even to NP’s. These folks don’t know therapy, only the current dose of whatever SSRI or stimulant they think they should be writing for. Oncology, you are saying is basically cookbook chemistry? What happens when your patient develops aplastic anemia or some other complication of therapy. Most PCP’s have never even seen a case, let alone recognize one.

Recommended Stories

Sign up for our Morning Rounds newsletter

Your daily dose of news in health and medicine.