Social Data in Academic Research

Sherry Emery, Abe Kazemzadeh and Jaime Settle discuss the role of social data in their academic research. For more background on this topic, check out our blog interviews with Sherry Emery and Jaime Settle

Sherry Emery, Abe Kazemzadeh, Jaime Settle, Paul Smalera at Big Boulder

The volume of social data being captured is just begging to be studied, but funding and grant issues, lack of standardized protocols about research projects using this fluid (and sometimes deletable) data set, lack of curricula for data science and social science purposes, and different timelines facing academics and the companies collecting the data are some of the problems currently facing academia. Research labs hope to bridge these gaps through partnerships with social data companies.

Smokin’ Cigarettes, Smokin’ BBQ, Smokin’ Hot Girls
Capturing the whole conversation about smoking on a social media channel for a research project proved difficult, Sherry says. One of the challenges is selecting the right keywords. But as she learned, in a project, there were approximately 70 million Tweets that contained the keyword “smoking.” But as the team used a software to categorize the large volume of Tweets, they learned only a third were about smoking tobacco, her topic of study. Other categories included smoking marijuana to smoking ribs to “smoking hot girls.” Studying the smoking Tweets also revealed an interesting sentiment: people who tweeted about smoking tobacco cigarettes felt ashamed while those who tweeted about smoking pot felt proud.

How Academics Use Social Data vs How Companies Use Social Data
Academics have the luxury of time studying social data, but not the luxury of a time machine. Researchers are chasing a more historical perspective of the data, but unless they are aware of and can anticipate the keywords and events that matter, their pursuit could be snuffed out. It’s a double-edge sword. Social data is streaming, which means academics can’t and don’t always anticipate the necessary keywords to pull in data early enough to fully capture an event and behaviors they want to study. For example: Sherry’s team serendipitously captured Tweets about a proposed ballot in California to increase the price of cigarettes, but the majority of Tweets didn’t contain the usual smoking keywords of “cigarettes” and “smoking” and “tobacco.” By the time the researchers realized this, they had already missed a large portion of the social data. Popular opinion, Sherry says, changed dramatically between the three months before the vote and in the voting period.

Social Media Companies + Academia = Match Made in Data Nerd Heaven
Jaime says that because the nature of social data and its tools are forward looking, they are not designed to get data retroactively or historically. Perhaps this is an opportunity for academia and social media companies to partner and rely on each other as resources. There is a need for curricula for university students so they can be employed at social companies as well as become social scientists, and social companies can influence what needs to be taught in such curricula in higher ed. This is a gap, and partnerships need to be forged between these two groups in order for the full potential for social data to be explored as the demand to understand it grows. Social companies have their own set of data and social data teams that are internal to their needs and goals to success as business. Academics see potential and overlap in goals in the very same data that these companies are collecting about users which could reveal insights to human behaviors.

Abe explains the pros and cons doing research for companies (but says the benefits outweigh the cons).

Pros:
Variety of funding from government grants
Interesting problems companies are facing
College students get an opportunity to work on cool research projects for real world problems
Cons:
Sense of urgency
Funding is on a subscription format; if company has a bad year, they cancel their subscription to research lab services

Looking Ahead
Funding seemed to be an overall challenge to academics looking to study social data. Challenges also include the ethical implications of using such a fluid data set on subjects who may not understand they are being studied. There needs to be a standardized protocol of the study, reporting and managing of social data — respecting the data and the subject being researched. The current situation is vulnerable to possible scandal in the case of an invasion of privacy or abuse of data. Institutional review boards need to begin to have a dialogue with researchers (and social media companies?) about best practices for this new niche of research before an egregious case occurs.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.