Can social media help track the spread of disease?

April 7, 2020

This piece was originally published on March 30 by the Gillings School of Global Public Health.

Disease surveillance means monitoring the spread of disease through populations in order to establish patterns and minimize harm caused by outbreaks. In a recent article, UNC-Chapel Hill researchers explored how to effectively and ethically include social media and broader Internet tracking as part of public health surveillance efforts.

Dr. Allison Aiello

They write: “We are on the precipice of an unprecedented opportunity to track, predict and prevent global disease burdens in the population using digital data.”

Their full article, titled “Social Media- and Internet-Based Disease Surveillance for Public Health,” was published in early 2020 by the Annual Review of Public Health. The co-authors — all from the UNC Gillings School of Global Public Health — are Allison Aiello, PhD, professor of epidemiology and a Carolina Population Center (CPC) Faculty Fellow; and Audrey Renson and Paul Zivich, both epidemiology doctoral students who work at the CPC.

Data collected via the Internet and social media sites have been used for years as a complement to existing outpatient, hospital and laboratory-based systems. This complement can be extremely useful because traditional surveillance systems rely on tracking only the individuals who seek medical care, and therefore underestimate the total disease burden.

This timeline shows the evolution of social media- and Internet-based surveillance systems.

One example of digital public health surveillance is tracking Google search terms: In one study, Google searches for “diarrhea” and “food poisoning” were shown to coincide with an outbreak of Salmonella in peanut butter. In another example, online restaurant review sites (like Yelp) helped identify foodborne disease outbreaks.

While Twitter is by far the most frequently used social media platform for digital surveillance, others can be used as well. For example, Facebook “like” patterns correlate strongly with a wide range of health conditions and behaviors, and Instagram timelines have been used to identify adverse drug reactions.

In addition to discussing why some early digital surveillance efforts — like Google Flu Trends — ultimately failed, the researchers delve into the ethics of social media- and Internet-based data collection. They outline five key concepts that must be kept in mind:

Beneficence: understanding that public health surveillance should always improve the health of the target population.
Nonmaleficence: taking thoughtful actions to reduce the potential harms of collecting data.
Respect for autonomy: recognizing individuals’ right to informed consent.
Equity: ensuring that all individuals in a target population have equal opportunity to receive a given public health intervention.
Efficiency: considering the cost–benefit analysis of any surveillance system.

“In the future, it will be important to identify the most beneficial ways to use digital data sources through hybrid or completely new independent systems,” write the co-authors. “The prospect of a new system, driven solely by digital technology, seems unlikely at present, but the continued advancement of machine learning […] may bring this idea closer to reality in the future. One challenge will be in the training of public health experts in computer science, big data and machine learning to harness novel sources of digital data and support innovation in digital surveillance while reducing possible harms.”