While looking at the speakers for the International Conference on Weblogs and Social Media, the premier academic conference for social media, I stumbled across the research of Lada Adamic. Not only was Lada one of the keynote speakers for the conference, her research at the University of Michigan was just plain awesome. Lada’s research included understanding commonly used ingredient substituions from the 40,000 recipes in Allrecipes.com, understanding how peers rate each other on Couchsurfing, Facebook memes, and more. You can check out all of her research on ladamic.com, follow her on Twitter at @ladamic and be sure to check out her hilarious blog.
1. Your background focuses on networks and how information spreads. You’ve done multiple projects with different data sources, what are some of the overarching trends you’ve seen?
The only sure thing is the unpredictability of information in a network. Sure, in aggregate some information will go viral, while most will not, but predicting what will go where, that’s not so simple. To complicate matters further, information is not only diffusing, but also evolving, while concurrently spurring changes in the social network itself. One trend I do keep seeing is that social networks’ greatest boosting effect is in the niche. There are lots of ways to find out about something widely popular, but information about that curious interest that you and your friends share — that is more likely to come through your friends.
2. What information do you get from looking at networks vs all the other sources you use?
I think it’s more a question of whether there are any data that I don’t try to represent as networks! All I have to do is identify connections between entities in the data, and presto, I have a network. It’s the structure of these connections that can turn up fascinating results: identifying experts from their online interactions, predicting which recipe is going to be rated more highly, or understanding the structure of federal law from the way it’s strung together.
3. What is useful, difficult and unique about connections found in social data?
Well, you’re dealing with data by and about humans. Humans are difficult. Humans interacting with other humans, that’s complicated… but also highly informative, because a lot of human interaction is about informing one another. And as they inform one another about what’s worthwhile, their location, their mood, etc., that data can be harnessed to detect trends and patterns in human behavior. And perhaps precisely because this data is so rich and powerful, it is important to be mindful of privacy.
4. You were able to determine commonly used ingredient substitutions by looking at 40,000 recipes from Allrecipes.com. How much did the comments in the recipes help determine substitutions and what other insights do you think could be pulled from recipe comments?
In the research paper we relied entirely on the comments in the user-supplied recipe reviews to figure out how often cooks substituted one ingredient for another in a recipe, whether ingredients can be cut or omitted, and, crucially, whether the recipe needs more or less garlic (our data showed, usually, more). Untapped kinds of information included in the reviews include who the recipe was a hit with (the kids, the husband etc.) and vetted improvements, e.g. “I put the dough in the fridge for 2 hours as the other reviewer suggested…”. I think this is a really fun example of harnessing our collective intelligence. Instead of each cook tweaking recipes in their own kitchen and sharing their recommendations with a few friends, now we can gather millions of tweaks and start to understand food and cooking systematically.
5. You’ve used data from a wide variety of sources including Couchsurfing.com, Allrecipes.com, Facebook, etc. What do you look for in a data source?
I’m not too discriminating about data, though sometimes I have a question that only certain data can answer. For example, when my husband and I first started dating, I defended my reluctance to watch Sci-Fi movies by pulling their ratings distribution from the IMDb. On an only slightly more serious note, I turned to online recipes because they comprised lots of data about something that I had no clue about: cooking.
Other times you just know the data is good even if your questions about it are not (yet). Such was the case with the CouchSurfing dataset, which encompassed anonymized user-to-user trust and friendship ratings. The data was so rich, that even our initial stumbling steps led to some interesting results about rating human relationships. But it wasn’t until the 2nd and 3rd paper that we really got a handle on how the visibility of the ratings skews them, and some more fundamental insights about the relationship between friendship and trust that are rendered beautifully evident in such a large data set.
6. What study have you done that has surprised you the most? What projects do you see in the future that you think academia should focus on to better understand social data?
Some nice surprises actually came up as I was gathering data for my statistics class. When the Economist published an article about the U-curve of happiness vs. age, I thought, wait a sec, we see the same curve in CouchSurfing ratings: people in their 30s & 40s rate and are rated less enthusiastically than those either younger or older. Then my statistics class used the American Time Use Survey to see how much sleep people were getting, and it was the same curve. Coincidence? I think not!
Another happiness vs. age trend came up in the Adolescent Health data, also analyzed in my stats class. Teens having sex in 8th and 9th grade were less happy on average than their peers who were abstaining, but by senior year, the relationship was reversed. It goes to show that you never know which underused columns in existing data sets hold fun statistics (we also explored the “cheerleading”, “math team” and “wears braces” columns…).
To answer the second question: researchers have only started to take advantage of the abundance of social data. There are many long-standing questions in sociology that were previously studied in small groups. Now these questions can be tested on very large data, just at the time when we really do need to understand how they pertain to changing social interactions as they shift online. Among the questions I’m personally interested in are how online social networks shape media consumption, and how information evolves in social networks.
I should mention that the crucial bottleneck for academics doing this kind of research is access to the data. GNIP is certainly part of the solution (you guys have academic discounts, right?). To anyone else who has interesting data, please consider sharing it with data-starved academics.
Thanks to Lada for her interview (and yes, we’re looking at partner programs for academic researchers!). If you have any other suggestions for Data Stories, please leave a comment.
Data Stories Series:
- Liv Buli of Next Big Sound, the world’s first music data journalist
- Hilary Mason, Chief Data Scientist of bitly
- Rumi Chunara of HealthMap, using social data to identify epidemics
- Sherry Emery of UIC, studying social data and smoking cessation
- Lada Adamic of Michigan on information networks
- Annicka Campbell of SapientNitro on the Digital Love Project