At least 550,000 cases. Maybe 4.4 million. Or something in between.

Like weather forecasters, researchers who use mathematical equations to project how bad a disease outbreak might become are used to uncertainties and incomplete data, and Covid-19, the disease caused by the new-to-humans coronavirus that began circulating in Wuhan, China, late last year, has those everywhere you look. That can make the mathematical models of outbreaks, with their wide range of forecasts, seem like guesswork gussied up with differential equations; the eightfold difference in projected Covid-19 cases in Wuhan, calculated by a team from the U.S. and Canada, isn’t unusual for the early weeks of an outbreak of a never-before-seen illness.

But infectious-disease models have been approximating reality better and better in recent years, thanks to a better understanding of everything from how germs behave to how much time people spend on buses.

“Year by year there have been improvements in forecasting models and the way they are combined to provide forecasts,” said physicist Alessandro Vespignani of Northeastern University, a leading infectious-disease modeler.

## Related: Experts envision two scenarios if the new coronavirus isn’t contained

That’s not to say there’s not room for improvement. The key variables of most models are mostly the same ones epidemiologists have used for decades to predict the course of outbreaks. But with greater computer power now at their disposal, modelers are incorporating more fine-grained data to better reflect the reality of how people live their lives and interact in the modern world — from commuting to work to jetting around the world. These more detailed models can take weeks to spit out their conclusions, but they can better inform public health officials on the likely impact of disease-control measures.

Models are not intended to be scare machines, projecting worst-case possibilities. (Modelers prefer “project” to “predict,” to indicate that the outcomes they describe are predicated on numerous assumptions.) The idea is to calculate numerous what-ifs: What if schools and workplaces closed? What if public transit stopped? What if there were a 90% effective vaccine and half the population received it in a month?

“Our overarching goal is to minimize the spread and burden of infectious disease,” said Sara Del Valle, an applied mathematician and disease modeler at Los Alamos National Laboratory. By calculating the effects of countermeasures such as social isolation, travel bans, vaccination, and using face masks, modelers can “understand what’s going on and inform policymakers,” she said. For instance, although many face masks are too porous to keep viral particles out (or in), their message of possible contagion here! “keeps people away from you” and reduces disease spread, Del Valle said. “I’m a fan of face masks.”

The clearest sign of the progress in modeling comes from flu forecasts in the U.S. Every year, about two dozen labs try to model the flu season, and have been coming ever closer to accurately forecasting its timing, peak, and short-term intensity. The U.S. Centers for Disease Control and Prevention determines which model did the best; for 2018-2019, it was one from Los Alamos.

Los Alamos also nailed the course of the 2003 outbreak of SARS in Toronto, including when it would peak. “And it was spot on in the number of people who would be infected,” said Del Valle: just under 400 in that city, of a global total of about 8,000.

The computers that run disease models grind through calculations that reflect researchers’ best estimates of factors that two Scottish researchers identified a century ago as shaping the course of an outbreak: how many people are susceptible, how many are infectious, and how many are recovered (or dead) and presumably immune.

That sounds simple, but errors in any of those estimates can send a model wildly off course. In the autumn of 2014, modelers at CDC projected that the Ebola outbreak in West Africa could reach 550,000 to 1.4 million cases in Liberia and Sierra Leone by late January if nothing changed. As it happened, heroic efforts to isolate patients, trace contacts, and stop unsafe burial practices kept the number of cases to 28,600 (and 11,325 deaths).

To calculate how people move from “susceptible” to “infectious” to “recovered,” modelers write equations that include such factors as the number of secondary infections each infected person typically causes and how long it takes from when one person gets sick to when the people she infects does. “These two numbers define the growth rate of an epidemic,” Vespignani said.

The first number is called the basic reproduction number. Written R0 (“R naught”), it varies by virus; a strain that spreads more easily through the air, as by aerosols rather than heavier droplets released when an infected person sneezes or coughs, has a higher R0. It has been a central focus of infectious disease experts in the current outbreak because a value above 1 portends sustained transmission. When the R0 of Covid-19 was estimated several weeks ago to be above 2, social media exploded with “pandemic is coming!” hysteria.

But while important, worshipping at the shrine of R0 “belies the complexity that two different pathogens can exhibit, even when they have the same R0,” the Canadian-U.S. team argues in a paper posted to the preprint site medRxiv. Said senior author Antoine Allard of Laval University in Quebec, “the relation between R0, the risk of an epidemic, and its potential size becomes less straightforward, and sometimes counterintuitive in more realistic models.”

To make models more realistic, he and his colleagues argue, they should abandon the simplistic assumption that everyone has the same likelihood of getting sick from Covid-19 after coming in contact with someone already infected. For SARS, for instance, that likelihood clearly varied.

“Bodies may react differently to an infection, which in turn can facilitate or inhibit the transmission of the pathogen to others,” Allard said. “The behavioral component is also very important. Can you afford to stay at home a few days or do you go to work even if you are sick? How many people do you meet every day? Do you live alone? Do you commute by car or public transportation?”

When people’s chances of becoming infected vary, an outbreak is more likely to be eventually contained (by tracing contacts and isolating cases); it might reach a cumulative 550,000 cases in Wuhan, Allard and his colleagues concluded. If everyone has the same chance, as with flu (absent vaccination), the probability of containment is significantly lower and could reach 4.4 million there. Or as the researchers warn, “the outbreak almost certainly cannot be contained and we must prepare for a pandemic ….”

## Related: CDC director: More person-to-person coronavirus infections in U.S. likely, but containment still possible

Modelers are also incorporating the time between when one person becomes ill and someone she infects does. If every case infects two people and that takes two days, then the epidemic doubles every two days. If every case infects two people and they get sick four days after the first, then the epidemic doubles every four days.

This “serial time” is related to how quickly a virus multiplies, and it can have a big effect. For a study published this month in Annals of Internal Medicine, researchers at the University of Toronto created an interactive tool that instantly updates projections based on different values of R0 and serial interval.

Using an R0 of 2.3 and serial interval of seven days, they project 300,000 cases by next week. If the serial interval is even one day less, the number of cases blasts past 1.5 million by then. But if the countermeasures that China introduced in January, including isolating patients, encouraging people to wear face masks, and of course quarantining Wuhan, reduce the effective reproduction number, as has almost certainly happened, those astronomical numbers would plummet: to 100,000 and 350,000 cases, respectively.

Just as public health officials care how long someone can be infected without showing symptoms (so they know how long to monitor people), so do modelers. “When people are exposed but not infected, they tend to travel and can’t be detected,” Vespignani said. “The more realistic you want your model to be, the more you should incorporate” the exposed-but-not-ill population. This “E” has lately become a fourth category in disease models, joining susceptible, infectious, and recovered.

At Los Alamos, Del Valle and her colleagues are using alternatives to the century-old susceptible/infectious/recovered models in hopes of getting a more realistic picture of an outbreak’s likely course. A bedrock assumption of the traditional models is “homogeneous mixing,” Del Valle said, meaning everyone has an equal chance of encountering anyone. That isn’t what happens in the real world, where people are more likely to encounter others of similar income, education, age, and even religion (church pews can get crowded).

“Ideally, you’d break the population into many groups” and estimate the likelihood of each one’s members interacting with each other and with every kind of outsider, Del Valle said. “Your model would become more accurate.”

Called “agent-based models,” they simulate hypothetical individuals, sometimes tens of millions of them, as they go about their day. That requires knowing things like how many people commute from where to where for work or school, how they travel, where and how often they shop, whether it’s customary to visit the sick, and other key details. Computers then simulate everyone’s movements and interactions, for instance by starting with one infected person leaving home in the morning, chatting with other parents at school drop-off, continuing to work on a bus, standing 2 feet from customers and colleagues, and visiting a pharmacy for her migraine prescription.

The models keep track of people second by second, said Los Alamos computer scientist Geoff Fairchild, “and let you assess the impact of different decisions, like closing schools during flu season.” (Some research shows that can dampen an outbreak.) Although “agent-based models can simulate reality better,” he said, they are less widely used because they require enormous computing power. Even on the Los Alamos supercomputer, a single run of a complicated model can take days or even weeks — not counting the weeks of work modelers spend writing equations to feed the computer.

## Related: Understanding pandemics: What they mean, don’t mean, and what comes next with the coronavirus

The Los Alamos researchers are still wrestling with their Covid-19 model, which is showing — incorrectly — the outbreak “exploding quite quickly in China,” Del Valle said. It is overestimating how many susceptible people become infected, probably because it’s not accurately accounting for social isolation and other countermeasures. Those seem to have reduced R0 toward the lower range of 2-to-5 that most modelers are using, she said.

In the current outbreak, researchers are building models not only to peek into the future but also to reality-check the present. Working backwards from confirmed infections in countries other than mainland China, researchers at Imperial College London who advise the World Health Organization estimated that Wuhan had 1,000 to 9,700 symptomatic cases as of Jan. 18. Three days later, all of mainland China had officially reported 440 cases, supporting the concerns of global health officials that China was undercounting.

In a more recent model run, Jonathan Read of England’s University of Lancaster and his colleagues estimated “that only about 1 in 20 infections were being detected” in late January, Read said: There were probably 11,090 to 33,490 infections in Wuhan as of Jan. 22, when China reported 547 cases. “It highlights how difficult it is to track down and identify this virus,” Read said, especially with residents of quarantined Wuhan being turned away from overwhelmed hospitals and clinics without being tested for the virus. Using a similar approach, modelers led by Dr. Wai-Kit Ming of Jinan University in Guangzhou estimated that through Jan. 31, China probably had 88,000 cases, not the 11,200 reported.

Read’s group is updating its model to estimate the fraction of true cases in February; China’s cumulative cases topped 60,000 on Thursday.

For modelers, a huge undercount can corrupt the data they base their equations on. But even with that disadvantage the Covid-19 models “are doing quite well, despite a lot of complicated dynamics on the ground,” said Los Alamos’s Fairchild. While it’s not clear yet if they’ve nailed the true numbers of cases, they are correctly projecting the outbreak’s basic shape: increasing exponentially, the number of cases growing more quickly the more cases there are.

• Trevor . A . Merchant says:

May be just may be repetitive .yet however it is my belief that what i do write , interpret and or conclude as such makes a lot of senses . with in the normal communique even when it does seems to be less of the Intellectual legacy . My point is that all of the condrums that we are telling the unread ,uneducated and uninformed do not make sense what so ever and even more delicate and dangerous they do not care as they in debt of reading ,writing and there from attaining Knowledge . so most of all that have taken place in China with in the past year of twelve months to this current death cells . tens upon tens of millions of Human beings were and still are lacking that there were and still are grave crisis before this one .the Pork industry of Pigs where Thousands of pigs have died from the Antimicrobial Resistant Super Bugs first and there after the Swine Bird Flu that have also killed tens of Thousands of Pigs in China , These millions of people world wide are not even aware or have the least idea of the world Antimicrobial Resistant ,what it is and means along with the Power of Deaths that it causes . more to come . Trevor .Merchant . New York City . Friday . February . 14 , 2020 at 2 .41 p.m eastern standard time

• Ken says:

Thank you for this article. My background is Dynamic and statistical comparative modeling of weather. I started looking at the numbers coming out of China to gain understanding of R0. This article confirms what I was understanding. I think that there is more than a simple under-counting in China right now. The numbers are too perfect. It looks like a fraud.

• Peter Daniel says:

F

• Tanny M Martin says:

Really helpful explanation, thank you. Will be watching with you.

• Aussie says:

Wuhan Coronavirus 2019-nCoV Projected Predictions (Covid-19)

There is a very good reason for using a 4-day average or a 9-day average, no different to swing charts and candlestick charts. They will always give fairly good accuracy straight off the bat. But only if the data is reasonably reliable