Skip to Main Content

Disease-modeling communities around the world have been working tirelessly since January to predict how and where Covid-19 will spread, with some real successes. A host of models have illustrated how, with the right resources, we can create relatively accurate disease forecasts that give communities and public health officials an idea of what to expect — and time to prepare.

But what if we could forecast epidemics regularly, before there’s a global crisis? That notion is inching closer to reality.

If we were able to detect newly emergent diseases, much as we now detect tropical storms, we could project their potential to become epidemics or pandemics. This capability would significantly augment our ability to minimize and prevent the spread of deadly diseases such as Covid-19, the flu, malaria, and others.


To be sure, being able to forecast epidemics and disease patters won’t be easy. If you think that weather is hard to predict, given all the variables of temperature, air moisture, atmospheric pressure, and the like, imagine predicting disease outbreaks, which are influenced by something much more complex: humans. With our diverse cultures, economic statuses, religions, friends, leaders, and a host of other inputs, human behavior is notoriously difficult to map, yet it has a huge impact on how diseases spread.

Another challenge is that current disease-monitoring systems rely on various sources of data, including patient interviews, medical provider reports, and lab tests, which are sent up a bureaucratic reporting chain and aren’t available to researchers for weeks — or longer. In the time of Covid-19, that timeline has accelerated, but too often when there’s no immediate crisis, by the time the information is made public, it’s not very useful.


Real-time data sources such as social media, cellphone data, satellite imagery, and other data streams can help close this gap. Twitter posts, as well as Google and Wikipedia searches, provide real-time information that doesn’t rely on face-to-face interviews and long processing times. We can immediately see what’s trending, such as tweets about illness symptoms or rumors about illnesses spreading. Likewise, Google and Wikipedia can show us when certain diseases or symptoms are the subject of searches.

These anonymized data streams can be mined to see where these posts and searches originate, which can provide an idea of where a disease is spreading and where it might spread next.

But Internet data streams have flaws for forecasting epidemics. For one thing, they lack standardization. For another, not all data output is coupled with geographic information. It’s impossible to verify the information provided. And during outbreaks of novel diseases like Covid-19 and Ebola, more people search those terms, but that does not necessarily mean the searchers have the disease.

Then there’s the question of bias. Everything from age, sex, and race to social status and global reach can affect the reliability of data. Language and cultural differences are another issue. Plus, in many countries where epidemics are common, Internet access is limited or nonexistent. In those cases, social media won’t help.

Another possible stumbling block is that words have multiple meanings, depending on the context. How do we know the 10,000 geolocated tweets using the word “fever” are talking about a viral illness and not Bieber Fever?

Fortunately, the vocabulary challenge can be combated with advanced natural language-processing algorithms that can infer context, recognize events, and deduce sentiments and opinions. They can also be addressed by frequently retraining models and carefully scrutinizing irregular patterns.

Cellphone data, satellite imagery, and news stories may complement Internet data to address some of the limitations and barriers to forecasting epidemics, and also provide additional information relevant to disease spread by helping measure behavioral changes. We’ve seen this with Covid-19: cellphone and traffic data have illustrated the changes in people’s movement as a result of shelter-in-place restrictions, which can be useful in understanding how diseases spread based on travel.

Smart watches can be another asset, collecting body temperature and other data in real time, potentially providing early detection of symptomatic individuals.

Researchers around the world are already working on the foundations of such a tool to forecast epidemics. My colleagues and I at Los Alamos National Laboratory, for example, have been participating in the CDC’s Epidemic Prediction Initiative since 2013, which initially focused on forecasting the flu but has since expanded to other diseases. And we are one of about 24 models or groups forecasting Covid-19 deaths for the U.S. as part of the CDC’s Covid-19 modeling efforts.

To make forecasting global disease a reality, we must first get serious about investing time and resources in the best models. Weather forecasting uses only a handful of vetted models, whereas the epidemic forecasting community uses hundreds of models of varying complexities. We need to focus our efforts on the most accurate and reliable models if we are to move epidemic forecasting forward.

The second thing we need is more data, both genomic and observational. Weather forecasting takes advantage of the stream of data from observations gathered by thousands of automated weather stations around the world and by satellites. The epidemic forecasting community needs something similar: global sensors that can provide real-time observations on environmental, animal, and human interactions, current disease incidence, and — perhaps most important — human behavior.

Although the Weather Bureau (now known as the National Weather Service) was initially established in 1870, it wasn’t until the past three decades that the accuracy of weather forecasting significantly increased, thanks to billions of dollars of investment in data collection and computational advances. Although the epidemic forecasting community has cloud computing at its disposal and foundational mathematical approaches for forecasting, it may be decades before we have the necessary infrastructure to collect, curate, and analyze data in real-time around the globe as well as to forecast all diseases around the globe at very granular scales.

Creating this infrastructure could go a long way toward making it easier to stop the spread of diseases and save lives. A global disease forecasting center would allow us to anticipate and prepare for the next outbreak rather than just reacting after the disease has taken hold. It’s in the realm of the possible. Working together, we can make it happen.

Sara Del Valle is a mathematical epidemiologist at Los Alamos National Laboratory in New Mexico.