Adam Sadilek has done some pretty ground breaking research around social data including tracking food poisoning with social data. When he was a Ph.D. student at the University of Rochester, he led a team that found a correlation between geotagged Tweets about foodborne illnesses that closely aligned with restaurants with poor scores from the health department. Adam is now a researcher at Google, and you can follow him on Twitter at @Sadilek.
1. Where did your interest in identifying health trends on Twitter come from?
First, it was studying how Twitter can predict flu outbreaks and then looking at identifying food poisoning outbreaks too.
We were interested in how much can we learn about our environment by sifting through the vast amounts of day-to-day chatter online. It turns out that machine learning can identify strong signals that can be used to make predictions about individuals as well as venues they visit. For example, in our GermTracker.org project, we predicted how likely is a Twitter user is to become sick based on how many symptomatic people he or she met recently. We leveraged geotags within the Tweets to estimate people’s encounters. In the nEmesis project, our model identified Twitter users who got sick after eating at a restaurant, which enabled us to rank food establishments by cleanliness.
2. Your machine learning can help assign scores to restaurants based on the chances of food poisoning that matches the Health Department based on Twitter data. Is there anyway to make Nemesis data public or as an add-on to services such as Yelp?
There certainly is — Henry Kaut’z group at the University of Rochester is working on an extending GermTracker to capture foodborne illness in real time as well.
3. What are the benefits and disadvantages of using social data over more traditional research on health patterns?
Online social media is very noisy, but significantly more timely. Many months pass between inspections of a typical restaurant. If they get a delivery of spoiled chicken a day after an A+ inspection, it will make their patrons sick anyway. Systems like nEmesis, on the other hand, can detect there is something going on very quickly. The flip side is that it’s hard to be certain on the basis of 140 characters. Therefore, we advocate for a hybrid approach, where inspectors use nEmesis to make better informed decisions. We can replace the current basically random inspections with a more adaptive workflow to detect dangerous venues faster.
4. What else do you think Twitter can tell us about public health?
We did a number of studies, focusing on multiple aspects of our health that can be informed by data mining online social media. Beyond flu and food poisoning, we looked at exposure to air pollution, mental health, commuting behavior, and other lifestyle habits. You can take a look at our publications at http://www.cs.rochester.
If you’re interested in additional interviews with people using social data in research, check out our 25 Data Stories to hear about how researchers used social data to track cholera after Haiti’s earthquake.