Cell phone data that is routinely collected by telecommunications providers can reveal changes of behavior in people who are diagnosed with a flu-like illness, while also protecting their anonymity, a new study finds. The Proceedings of the National Academy of Sciences (PNAS) published the research, led by computer scientists at Emory University and based on data drawn from a 2009 outbreak of H1N1 flu in Iceland.
“To our knowledge, our project is the first major, rigorous study to individually link passively-collected cell phone metadata with actual public health data,” says Ymir Vigfusson, assistant professor in Emory University’s Department of Computer Science and a first author of the study. “We’ve shown that it’s possible to do so without comprising privacy and that our method could potentially provide a useful tool to help monitor and control infectious disease outbreaks.”
The researchers collaborated with a major cell phone service provider in Iceland, along with public health officials of the island nation. They analyzed data for more than 90,000 encrypted cell phone numbers, which represents about a quarter of Iceland’s population. They were permitted to link the encrypted cell phone metadata to 1,400 anonymous individuals who received a clinical diagnosis of a flu-like illness during the H1N1 outbreak.
“The individual linkage is key,” Vigfusson says. “Many public-health applications for smartphone data have emerged during the COVID-19 pandemic but tend to be based around correlations. In contrast, we can definitively measure the differences in routine behavior between the diagnosed group and the rest of the population.”
The results showed, on average, those who received a flu-like diagnosis changed their cell phone usage behavior a day before their diagnosis and the two-to-four days afterward: They made fewer calls, from fewer unique locations. On average, they also spent longer time than usual on the calls that they made on the day following their diagnosis.
The study, which began long before the COVID-19 pandemic, took 10 years to complete. “We were going into new territory and we wanted to make sure we were doing good science, not just fast science,” Vigfusson says. “We worked hard and carefully to develop protocols to protect privacy and conducted rigorous analyses of the data.”
Vignusson is an expert on data security and developing software and programming algorithms that work at scale.
He shares first authorship of the study with two of his former students: Thorgeir Karlsson, a graduate student at Reykjavik University who spent a year at Emory working on the project, and Derek Onken, a Ph.D. student in the Computer Science department. Senior author Leon Danon — from the University of Bristol, and the Alan Turing Institute of the British Library — conceived of the study.
While only about 40 percent of humanity has access to the Internet, cell phone ownership is ubiquitous, even in lower and middle-income countries, Vigfusson notes. And cell phone service providers routinely collect billing data that provide insights into the routine behaviors of a population, he adds.
“The COVID pandemic has raised awareness of the importance of monitoring and measuring the progression of an infectious disease outbreak, and how it is essentially a race against time,” Vigfusson says. “More people also realize that there will likely be more pandemics during our lifetimes. It is vital to have the right tools to give us the best possible information quickly about the state of an epidemic outbreak.”
Privacy concerns are a major reason why cell phone data has not been linked to public health data in the past. For the PNAS paper, the researchers developed a painstaking protocol to minimize these concerns.
The cell phone numbers were encrypted, and their owners were not identified by name, but by a unique numerical identifier not revealed to the researchers. These unique identifiers were used to link the cell phone data to de-identified health records.
“We were able to maintain anonymity for individuals throughout the process,” Vigfusson says. “The cell phone provider did not learn about any individual’s health diagnosis and the health department did not learn about any individual’s phone behaviors.”
The study encompassed 1.5 billion call record data points including calls made, the dates of the calls, the cell tower location where the calls originated and the duration of the calls. The researchers linked this data to clinical diagnoses of a flu-like illness made by a health providers in a central database. Laboratory confirmation of influenza was not required.
The analyses of the data focused on 29 days surrounding each clinical diagnosis, and looked at changes in mobility, the number of calls made and the duration of the calls. They measured these same factors during the same time period for location-matched controls.
“Even though individual cell phones generated only a few data points per day, we were able to see a pattern where the population was behaving differently near the time they were diagnosed with a flu-like illness,” Vigfusson says.
While the findings are significant, they represent only a first step for the possible broader use of the method, Vigfusson adds. The current work was limited to the unique environment of Iceland: An island with only one port of entry and a fairly homogenous, affluent and small population. It was also limited to a single infectious disease, H1N1, and those who received a clinical diagnosis for a flu-like illness.
“Our work contributes to the discussion of what kinds of anonymous data lineages might be useful for public health monitoring purposes,” Vigfusson says. “We hope that others will build on our efforts and study whether our method can be adapted for use in other places and for other infectious diseases.”
The work was funded by the Icelandic Center for Research, Emory University, the National Science Foundation, the Leverhulme Trust, the Alan Turing Institute, the Medical Research Council and a hardware donation from NVIDIA Corporation.