By Heather Hamilton, contributing writer
Arguably one of the most useful applications of machine learning may be our ability to forecast and model the spread of infectious disease. Using its big-data models, the Center for Disease Control aims to create a model that accurately tracks the spread of flu, reports Digital Trends. The CDC has released a forecast for the last four years, building on methods of research that will more accurately predict the flu season.
The forecasting research initiative invites participants to submit forecasting systems of their own design, which are then judged based on accuracy. Each must forecast when the season begins, when the season peaks and how bad it is at that point, and how bad it continues to be after one, two, three, and four weeks. Then participants provide a new forecast for each of 10 U.S. regions, containing each of the aforementioned every week through flu season, incorporating new data. At the end of flu season, the forecasts are placed next to actual data for comparison.
This year, 28 systems were submitted to the CDC, including two from Carnegie Mellon’s Delphi research group. Unsurprisingly, they took the first two spots in the ranking, claiming the top spots for the third year in a row.
Right now, the CDC monitors the flu via a surveillance system that looks at what is happening as opposed to what could happen. They anticipate that current research will make a significant impact on future flu predictions.
The Delphi group is working on an improvement to CDC surveillance techniques that makes data available in near real time while maintaining accuracy. “It takes awhile to collate all these numbers, compile them, check them, and publish them,” said Roni Rosenfeld, who leads the research group, in an interview with Digital Trends. “So as a result, when the CDC publishes their surveillance numbers online, they actually refer to the previous week, not the week that we’re in. So they’re already between one and two weeks old.”
Delphi-Stat is a non-mechanistic model that relies on artificial intelligence to predict patterns based on past patterns and existing data, while Delphi-Epicast bases forecasts on judgments of volunteers who submit their own weekly predictions — essentially, it’s crowd-sourcing.
By using data that the CDC currently collects as well as Google Trends, statistics, social media, retail sales of flu medication, and Wikipedia access logs, they’re attempting to identify how the flu is spreading. Rosenfeld notes that this might indicate flu awareness and not actual flu spreading. “If there’s unusual news coverage of the flu — maybe because a celebrity got the flu or something — you would expect to see that influencing how many people search for flu on Wikipedia or on Google,” said Rosenfeld. “But it would not influence how many people are hospitalized for flu.” Currently, the Delphi group is working to overcome these challenges.
To forecast the flu, the group is bringing together three methods developed in the last few years to combine models of flu dynamics with time series analysis methodology. Delphi’s Epicast system received a skill score of .451 and the Stat project received a score of .438 — a perfect prediction would get a score of 1.00. When the CDC combined all 28 submissions for a cumulative forecast, it scored only .430.
The Delphi group works toward helping the CDC’s flu forecast, and Rosenfeld says that they are the main driver behind Delphi’s research, which may extend to eventual use in hospitals to determine staffing and equipment needs. Of course, a forecast isn’t guaranteed, so even a good one can’t completely prevent sickness.
The Delphi group is also using its platform to examine dengue fever and hopes to explore HIV, Ebola, and Zika.
Sources: Digital Trends,Epidemic Prediction Initiative, Carnegie Mellon, Delphi
Image Source: Pixabay
Learn more about Electronic Products Magazine