Charting the Future of Social Data: The Big Boulder Initiative

In June, we launched the Big Boulder Initiative at our annual Big Boulder Conference. The goal of the Initiative is to establish the foundation for the long-term success of the social data industry by addressing the key challenges facing us as an industry. Since that announcement, there’s been a lot going on and it’s time for an update.

This fall, we held small workshops in 4 different cities – New York, Washington DC, Seattle and San Francisco.  The goal of these workshops was to get 15-20 thought leaders together in each city to discuss the future of social data and the challenges we all face in creating the future we believe is possible. The feedback on these sessions has been fantastic. You can check out the video recap below:

Across all of the workshops, we had over 60 participants, representing a range of different perspectives – publishers, brands, solution providers, analysts, public sector, finance and more.  This included companies such as Adobe, Boeing, Disney, Fidelity, Foursquare, General Dynamics, Microsoft, NASDAQ, Nordstrom, NYSE, Palantir, Salesforce, Thomson Reuters, Twitter, and others.

Out of these workshops, it’s clear there’s no lack of challenges or issues to address.  This isn’t surprising given the early nature of the social data industry and is one of the reasons the opportunities are so exciting. Across the workshops, there were six key areas that bubbled up that we agreed should be the highest priority.

  • Privacy, Trust & Regulation

  • ROI & Value

  • Data Access

  • Data Standardization

  • Cost of Data

  • Data Quality & Validity

The participants in each workshop also elected members to a Board of Directors for the Big Boulder Initiative. The Board has 10 members and represents a great cross-section of the industry, with representation from Publishers, Solution Providers, Brands, Analysts, and emerging verticals like Finance and the Public Sector. The Board members are:

  • Carmen Sutter from Adobe

  • Chris Moody from Gnip

  • Damon Cortesi from Simply Measured

  • Jason Gowans from Nordstrom

  • Jason Thomas from Thomson Reuters Special Services

  • Megan Kelley from Fidelity Investments

  • Stu Shulman from Vision Critical

  • Susan Etlinger from Altimeter Group

  • Tom Watson from NYSE

  • Zach Hofer-Shall from Twitter

With the Board in place, the next step is for the Board to meet in the New Year, discuss all of the inputs from the workshops and chart a path forward. I’ll be serving as interim chair until the Board selects the full-time chair and I expect that one of the key points of discussion will be how to ensure that all voices in the industry are heard and have the opportunity to participate.  If you already know you want to be involved, you can go to and put your name on the list to be notified as things move forward.

It’s been great to get the Big Boulder Initiative off the ground and into the hands of such a strong and capable group to drive things forward.  Look another update in Q1 after the Board has met.


Help Gnip Present at SXSW!

We had a great time at SXSW this year with our Big Boulder: Bourbon & Boots event, and we’ll be heading back again next year. We’ve submitted three speaker submissions this year, and if you think the below are topics you’d like to listen to at SXSW, we’d love an upvote (or three!)

The Anatomy of a Twitter Rumor:
Solo presentation by Gnip’s lead data scientist Dr. Scott Hendrickson 

Like a match to a fireworks factory, the hacked AP account ignited rumors that President Obama had been hurt in a terrorist attack causing a hundred billion dollar drop in the stock market. What was even more significant about the Hash Crash was the ability of Twitter users to suppress the rumor and cause the market to rally within minutes despite how quickly and far the rumor spread.

This session by Gnip data scientist, Dr. Scott Hendrickson, will look at the anatomy of a Twitter rumor, how it spreads, how Twitter users react with accurate information and how rumors die. Looking at a bank run, the rumors from Hurricane Sandy and the Hash Crash, we’ll see why Twitter users are good at ferreting out fact from fiction and how to recognize the difference on Twitter.

White House Hash Crash

A look at the White House Hash Crash

Beyond Dots on a Map: The Future of Mapping Tweets
Ian Cairns of Gnip and Eric Gundersen of MapBox

Earlier this year Gnip and MapBox collaborated on three different maps using geotagged Tweets and this presentation is an extension of that work.

What can 3 billion geotagged Tweets collected over 18 months tell us? Turns out, a lot. Gnip collaborated with the team at Mapbox to study 3 billion geotagged Tweets in aggregate and visualize the results. That work led to 3 maps showing iOS vs Android usage, where tourists vs. local hang out, and language usage patterns. From just these maps there were some surprising findings revealing demographic, cultural and social patterns down to city level detail, across the entire world. For instance in the US, Tweets from iOS showed where the wealthy live ( The data has many other stories to tell as well. As Twitter use becomes more ubiquitous, it’s increasingly serving as a valid proxy not just for what’s happening “on social media,” but for what’s happening in the world in general. This is the first time social data has been mapped at this scale, and we’ll talk about both lessons gleaned from the data and what we learned about making this big of a visualizations.

Marketing’s Big Data Dissonance:
Duo Presentation by Rob Johnson of Gnip and Dan Neely of Networked Insights

Marketers know they need big data, but like the velvet rope blocking entrance to a SXSW music event, the perceived barrier is hard to overcome. The problem for the modern marketer: cutting through the noise of all of this data and zeroing in on insights that can help them better reach consumers. Big Data grows every day and marketers are faced with an additional challenge: keeping up with the speed in which new consumer data is created. The good news for marketers is that there’s no shortage of places to get information about consumers–point of sale systems to mobile check-ins to even consumer conversations across the social web. Together, all of these actions add up to an incredible mass of information known as Big Data for marketers. In this session, Networked Insights will be joined by Gnip and to discuss the tools and techniques that marketers need in order to turn the mass of Big Data into actionable and understandable insights.


Looking Back at Big Boulder 2013

Liz Phillips Hiking at Big Boulder

Photo courtesy of Liz Philips

Big Boulder was 16 sessions of social data goodness with more than 200 attendees coming together to learn, collaborate, network and maybe get in some hiking. Creating a space where the leaders of the industry can get together in an intimate setting is what we set out to create, and it’s rewarding to hear that others agreed. My favorite piece of feedback was hearing from Adam Laiacano, the data scientist from Tumblr, was that it was “the most Baller conference I’ve ever attended.”

A highlight for me was to see the introduction of the Big Boulder Initiative, whose mission is — “To establish the foundation for the long-term success of the social data industry.” I love that it allows everyone to gather more than once a year to collaborate on what is still a nascent industry. If you’re interested, you can check it at

It was also fun to see how social data is expanding outside the United States. Every single publisher interviewed on stage was focused on international growth, especially in Brazil, India, Europe, China and Japan. One of my favorite takeaways was from the session on Social Data in China about how people constantly look for the Twitter or Facebook of China but you can’t make that kind of comparison.

If you missed a session or couldn’t make it to the conference, don’t worry! We have blog recaps of each and every session (see below) summing up the highlights. We also have pictures of the conference on our Facebook page, a Storify recap, and you can always catch up with the hashtag #BigBoulder.

During Big Boulder, we held a contest to see who could create the best Vine. We awarded the mini iPad to Carmen Sutter for having the best Vine. We had a hard time choosing so be sure to check out some of the other Vines showing unique Boulder culture, take a spin around Big Boulder, see some of the more creative breakfast options and go for a ride.

You can see the 2013 Big Boulder recap video, which has a great summary of the highlights from the conference.

Social Data and Primetime TV

An interview with Maya Harris from GetGlue on social data and primetime TV. 

Maya Harris of GetGlue

GetGlue is a social entertainment phenomenon to be reckoned with. TV is intrinsically social, and GetGlue is leading the social TV movement. People used to physically gather around the television to watch a show or sporting event, and “social” TV meant gathering around the proverbial water cooler to talk about last night’s episode or game. Now, audiences are cutting ties with traditional cable television and increasingly turning to streaming video and time-shifted video watching via Hulu and Netflix. But social networks are allowing networks and studios to connect and engage with fans, as well as fans to connect and engage with each other, in an efficient way and on a greater scale.

How Are Consumers Using GetGlue?
Users check in on GetGlue about what they’re watching on TV, movies, and sports, and they can earn rewards along the way. Based on a user’s actions, GetGlued puts together a taste profile, which fuels recommendations and a personal guide in calendar format which shows users what they like to watch, what GetGlue thinks they might like to watch, as well as friend recommendations and trends. GetGlue can also customize recommendations tailored to the user’s preferences and entertainment experiences, i.e. if they prefer HD viewing or “I don’t have HBO so don’t tease me with shows I can’t watch,” etc. Also, its second screen after check-in mashes up multimedia content form other channels like YouTube and Twitter. GetGlue is also refining its commentary platform, where comments of friends will show ahead of total strangers, despite timing of the comments.

GetGlue by the Numbers

  • 25 percent of 18-34 year olds comment about what they like and don’t like while watching TV.
  • More than 50 percent of users want to connect with a fellow fan — they are looking for their niche entertainment community.
  • GetGlue partners with 75 networks and 25 studios to create incentives (stickers, for example) to reward fan loyalty.
  • 50 percent of stickers are shared to Facebook and Twitter = a genuine fan endorsement and viral marketing tool across highly visible social channels.
  • 20 percent of consumers start watching a show because of a social impression.
  • 70 percent of users are U.S.-based. International growth at this time is organic and concentrated in Great Britain, Canada, and Australia.
  • GetGlue found that 7 out of 8 people, when they sit down to watch TV, have no idea what they want to watch.
  • 50 percent of users share what they are watching to other social channels like Facebook and Twitter

What Differentiates GetGlue From Other Social Networks?
There are numerous online channels to discuss TV, so how does GetGlue stand out? It provides a focused place for fans to engage, which is important. GetGlue believes that fans “deserve a place to focus on what they want to say” about TV, movies and sports. Scripted TV is the bread and butter of GetGlue. When a fan wants to comment on Game of Thrones or The Mindy Project, they may not want to share with all their Facebook or Twitter friends; fans want to share with people who are also involved and care about the show and actively doing same thing, watching and commenting, in the same moment. This rich commentary is one of the fine points of the social data GetGlue is curating from its users.

Fun fact: Fans of the AMC show The Walking Dead crashed the GetGlue system one night; it was the top scripted cable show on GetGlue for 8 weeks.

And the Winner Is …

GetGlue has had a starring role at the GRAMMYs, an age-old event that is embracing this new world of social TV. This year GRAMMY viewers were rewarded for checking in during the live broadcast and the website; there was also a prize giveaway, which included tickets to the invite-only event. The GRAMMYs via GetGlue saw 140,000 total activity but through the social networks Facebook and Twitter reached an audience of 40 million. The amplification effect is where the value is found. Also, data mimicked peak times through the live broadcast, from the opening act of Taylor Swift to the repeated wins for Adele. GetGlue was the second driver of all traffic to the GRAMMY website.

Why Should Networks and Brands Care About Social TV?
Because viewers are talking about your content. And if you aren’t listening, you can’t partake in the conversations. Networks understand that this is important and are focusing more resources to figure out what the value is. Studies show that players in the social TV space have correlated check ins on GetGlue with Nielsen ratings; networks have told GetGlue that check ins are the best predictive element for what the ratings will be. Networks are moving beyond the total check ins, tweets, RTs, comments and likes and are now digging deeper and analyzing the conversations and what value lies within. Increasing amounts of social chatter contribute to ratings and revenue. It’s important because people are not only tuning in, but the fan base is engaged.

Fun fact: The rabid fan base and fan acitivty on GetGlue single handedly kept The CW from cutting the show “Nikita.”
Brands are just starting to look at what’s available in this social data space and are spending money in linear ads in order to monitor what’s being said in a real time basis about them (and their competitors) in order to optimize current campaigns and longer-term brand messaging in order to market to the consumer better.

Stay tuned
There is a wealth of data. Consider the number of eyeballs watching a TV show and evaluate the engagement: check ins, comments, replies, likes, votes … studios, networks and brands can start to get a great picture of the activity around their shows, which opens great opportunities to communicate with fans and engage with them elsewhere. If a fan is using GetGlue or other social networks, there is a higher likelihood that they will adopt their other digital platforms and engage there, too.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Data Science: The Sexiest Profession Going

Data scientists Mohammad Shahangian of Pinterest; Kostas Tsioutsiouliklis of Twitter, Adam Laiacano of Tumblr discuss the challenges and opportunities in social data.

Data Scientists at Big Boulder

As Gnip’s own data scientist Dr. Skippy was joined on stage by three data scientists representing three prolific social networks, Big Boulder Master of Ceremonies Lindsay Campbell couldn’t help herself gushing to the crowd, “This is by far the sexiest panel this year”. (Which was a reference to the Harvard Business Review naming data science the sexiest profession of the 21st century.)

Physical appearance aside, there could hardly be a truer statement to Big Boulder attendees: a legion of self-proclaimed data nerds.

Scott Hendrickson, better known as Dr. Skippy, Data Scientist at Gnip was joined on stage by Mohammad Shahangian of Pinterest, Kostas Tsioutsiouliklis of Twitter, and Adam Laiacano of Tumblr.

A Look at the Data Science Departments

The conversation began with each guest sharing the size of data science teams and roles at their respective organizations.

The data science team at Twitter is currently comprised of 7-8 people, looking to build to team of 20 in the near future (see open positions here). Data scientists at Twitter fall into two departments: a business intelligence and insights team of data scientists and individual data scientists who are embedded into teams. Data scientists embedded into teams become key stakeholders in improving and evolving the product.

The business intelligence team works collaboratively to explore ideas and create reports, even if it is not always favorable to the company. As Kostas explains, data scientists are trusted at Twitter. It’s ok to report the truth.

At Pinterest, there are 8 full-time data scientists on the team. The primary goal for data scientists is to understand what users are doing, to put pinners first- a strong company value. Much like Twitter, Pinterest data scientists are integrated into other engineering teams. This blend of engineers and data scientists on the same team enables nimble product iterations. Since adding data scientists to the mix at Pinterest teams are now requesting deeper and deeper metrics to measure success and plan product.

Tumblr’s team of data scientists is also eight strong in two roles, first a search and discovery team six strong and second, a two person, very self reflective business intelligence team. The search and discovery team is tasked to maintain the quality of the data and build products that can make the data usable, and ensure the end product is something users enjoy. The business intelligence team of two people is highly self-reflective investigating actions users take to determine which actions are indicatory of long term success.The outcome of which is most frequently is reporting.

Data Science Impact on Product

At Tumblr, there is a significant amount of testing around registration and onboarding, what users see when they land at However, Adam is quck to add that Tumblr has a unique view on their research, stating, “You don’t have to do as much research on your product when you use it yourself”.

Data scientists at Twitter report metrics all the way to the top. The CEO and the executives are asking questions about the data around launch of a new product and value the input of data scientists.

By sharing data with product teams, Pinterest engineers are being driven by the data. Mohammad shares, “After exposing metrics to people, the first instinct is to want to make the metrics better. This brings a culture of people who come to the data science team and seek their input. They take the ideas of product and run some queries to see if the data validates it. We’ve made it very easy for product teams to set up experiments, we don’t even call them experiments anymore.” Expounding on this fact, he shares an anecdote from a recent rewrite of the entire website. When launched, scientists noticed a dip in follows. Investigation from the team lead to understanding that the enhanced speed of the rewritten website had eliminated a small lag which followed a users like. A lag of time in which users had been following pinners on the site. By correcting the lag, follows went back up.

Who You Callin’ Sexy?

As Dr. Skippy joked about the popularity, ahem sexiness, of the data science title, conversation turned to the lack of an industry standard definition for the role, noting there is often confusion and a lack of differentiation from business analysts and business intelligence roles.

Kostas began noting that data science is not about analyzing but about prediction. Twiter data scientists are also engineers. Backgrounds of Twitter data scientists include statistics, data mining, machine learning, and engineering.

Further delineating from data analysts, Mohammad points out that role isn’t pulling their own data. Continuing on he added, “If you can’t pull your own data, how can you figure out what you want? A data scientist is skeptical. If results seem too good to be true, they will investigate. Question the data. Analysts will take the data as the data.”

Adam relates a good scientist as individual who can get data in any format and clean it up, can take weird, fuzzy forms and see the layout of the information is available. To connect the puzzle and build the data set that is useful.

The Future For Analysis of Social Data

Much of data science to date has been ad hoc, but the panelists agree that as you look closely at what data scientists do, it’s templates and patterns. Over time this work will become progressively more standardized. With new, faster tools it will move away from ad hoc processes. Teams will build models and tools to solve recurring problems.

Adam of Twitter added optimistically that the future is the work data scientists will do as they collect data across platforms and across multiple streams. It’s up to those developing third-party tools and resources to innovate using all the data.

Lastly, Mohammad chimed in that machine learning and prediction modeling is the sexy amongst the sexy. Adding, “That’s what we’re all waiting for”.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook pag

Creating and Sharing Content on WordPress

An interview with Paul Maiorana, Vice President of Platform Services at Automattic, about creating and sharing content on WordPress. 

Paul Maiorana Big Boulder

There are a lot of names for the WordPress/Automattic group, so it’s important to distinguish who is who. WordPress, who just celebrated their 10 year anniversary in May is an open source platform, free to use and free to download. Automattic (named for its founder, Matt Mullenweg) is the organization providing services around WordPress and handling its infrastructure. Lastly, Jetpack is the plugin used to add features to a WordPress site, powered by the cloud infrastructure.

Paul Maiorana, Automattic’s VP of Platform Services dove into the VIP, a solution for large media organizations and enterprises. You can run WordPress anywhere in the world, and Automattic is the largest user of and contributor to the open source platform. They’ve built a significant amount of knowledge around scaling the product and now provide this knowledge to enterprises. Huge organizations like Turner Broadcasting, federal agencies and a wide spectrum of other groups are customers.

“Biggest Home of Users on the Web”

WordPress has a philosophy when building their open source software – the idea of the independent web. Paul says they like to think of WordPress as a digital hub and your home on the web. At the end of the day, they try to give you (the user) the tools to create and export content and put it where you want. The user will always own WordPress as much as the company does. “A place on the web you can call your own, where you own the data, you own the experience,” says Paul, is part of the DNA at WordPress. More than 18% of the top 10 million website are WordPress, and 70 million WordPress websites are hosted between and other sources.

Blogging and Enterprise

While WordPress’ roots have always been in blogging, they see themselves as more of a content management system. This perception has persisted because of reputation. But over the last couple years, they’ve expanded on this to bring tools to customize user sites and take advantage of it to be more than just a blog. More and more organizations are using WordPress as a CMS these days instead of just a blog. On an enterprise level, major websites like CBS are using WordPress for CMS. It’s a testament to how the tool has evolved over the recent years.

Product Roadmap

Paul says product decisions have an interesting in relationship with the open source portion of WordPress. At the end of the day, WordPress has little control over what happens on that side. Unlike other CMS platforms, WordPress updates three times a year. It is updated without breaks to make it seamless for people to use the best WordPress there is. Within Automattic, they’ve built a lot of enterprise solutions and open source solutions to help make WordPress better for everyone.

Mobile is also a huge focus of what they’re currently focusing on, and how they will continue to shape their roadmap. For now, it’s a big initiative in two ways: from a front-end user experience and from a dashboard admin experience. The past three releases have focused a default theme that is responsive, and they will continue to do so. For the admin experience, mobile is perfect for “of the moment” publishing. With apps for IOS, Android, Blackberry, and Windows, more content publishers will have the ability to publish on the go efficiently. They’ve seen real world use cases too, with reporters catching stories first because they were able to use the mobile publishing.

WordPress and Social

Blogging is inherently social and it’s not an accident comments are an important part of the WordPress software. The conversation is an important part of publishing on the web.  Paul said WordPress spends a lot of time thinking about additional social features they can add (likes, re-blogging, following, subscribing to updates). Looking forward, they’re hoping to expose the idea of consuming content within WordPress. They’re experimenting with reader interface and giving users ability to subscribe to content they like from topics or specific blogs and then see it all in one place and interact with it socially.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Measuring Impact on Facebook

An interview with Daniel Slotwiner, the Head of Measurement Solutions Group for Facebook, on measuring impact on Facebook. 

Daniel Slotwinter, Head of Measurement at Facebook

“There are a lot of misconceptions about Facebook and data,” Chris Moody eloquently opened the interview with Daniel Slotwiner, Head of Measurement Solutions Group for Facebook. For Daniel’s team, their job is to build tools and methods of analysis to highlight the value of Facebook’s media business. But as Daniel explained, it’s not a win for a brand to measure a brand campaign by the CTR it gets. Instead, he emphasized the importance of working with an advertiser who is defining objectives and setting the right measurement program alongside. The Measurement Solutions Group not only tries to build the tools the industry can use, but also educate and work with them to get the most out of the ecosystem. The hope is that the ecosystem will be self-sufficient.


Last year Facebook announced their partnership with Datalogix, initially for measurement. However, with Datalogix’s comprehensive roster of US households, Facebook realized the impact of the information they could provide to advertisers. Datalogix and Facebook have been able to append data of frequent shoppers with consumer purchase decisions. This has aided in analyzing the impact of Facebook in driving offline sales. With more than 80 campaigns executed with these tools, Facebook can see which segments are responding to the advertising and make smarter campaign. At the end of the day, the value of this data is just to calculate ROI, but rather the scale allows for in-depth analysis and huge learnings for not only Facebook, but also advertisers.


If the unique advantage of Twitter is that everything is public, Facebook’s advantage is knowing who is saying what. The uniqueness of this data is two fold: scale and concept of identity (demographically and geographically).  If advertisers can understand the value of this data, they have a fantastic starting point.

It’s hard to argue Facebook isn’t doing a good job of scaling their users. “Obviously we love new users,” David said, and it’s still a huge focus for Facebook, as it expands internationally. And they’re prioritizing serving everyone in the world, especially through segmenting. When it comes to the level of use, Facebook has found light users are more receptive to advertising in comparison to heavy users. As  advertisers, understanding this user segmentation can help shape campaigns and execution on the social network. Facebook is intent building these insight back into the advertising systems to help advertisers make better decisions.

Value in Multi-Point Attribution

The world of influencing consumers is only getting more complex. In one sense it’s because there’s so many touch points. Facebook is focused on making sure the measurement systems are keeping pace with the world, but this is virtually impossible. There are a lot of approaches, but Facebook is pretty focused on multi-touch attribution systems to measure. One way they can look into this is through mobile.

Because almost all users access Facebook using mobile, they get to observe a lot and measure they information around mobile usage. This is information Facebook eventually wants to share with the industry. The platform allows for see the different paths to purchases because Facebook has so much visibility into the touch points. Facebook is in a excellent position to observe how many devices people have and how content is distributed across them.

At the end of the day, there’s a lot of data that can be utilized from Facebook. However, Daniel urges the proper use cases of the data. Research, for example is a huge opportunity given the quality of the data. Daniel cautions against the use of the data for its prediction. While a brand may use the discussion online to respond to an emergency or to participate in the conversation, it’s not clear if they should use it as an objective to drive more sales online.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Twitter Certified Partners and International Expansion

An interview with Conway Chen and Zach Hofer-Shall of Twitter on Twitter Certified Partners and International Expansion.

Zach Hofer Shall and Conway Chen of Twitter

As Chris Moody sat down with Conway Chen and Zach Hofer-Shall of Twitter this morning, the conversation began with shared optimism on increased talk about Twitter data. All panelists were quick to praise the recent conversation of Twitter CEO Dick Costello on All Things D, where the Twitter data stream was the star of the conversation.

Conway explained this emerging interest in data with an anecdote around Twitter’s early expectations when opening the data stream- expectations that were little to none. Instead, it is the innovation built using the data that is making Twitter infinitely more valuable.

Twitter Data Is Special

4 things set Twitter data apart:

1. It is real time

2. It is public

3. It is conversational, people aren’t just speaking into the ether the conversation goes both ways

4. It is distributed

Honor Thy User

It is a delicate balance to simultaneously respect users creating the data while also wanting to get data out there and ensure it is monetizable. Zach is quick to mention strict adherence and support of a Twitter core values: Defend and respect the users voice. He continues by stating that if this goes wrong, the whole system falls apart.

Twitter has mindfully created a structure that honors this, a key component of which is data resellers. Data resellers enable Twitter to maintain values and still be able to scale. These partnerships have allowed Twitter to encourage and foster innovation in ways they would not have been able to.

Sustainability and Long-term Growth

Conway- we are absolutely committed to the success of Twitter data and the ecosystem around it. Continuing to look at is the data we are pushing out correct? Is the way we are pushing out helping resellers and developers to innovate and build on it? Twitter data and the strategy around Twitter data is pivotal in how Twitter sees their growth.

Data is a core part of the business that wasn’t always seen as a core part of the business. We are so invested in the success of Twitter data long term that we are committed to seeing it scale. And a key part of that is improving efficiency.

There is an understanding now that Twitter data is important- this speaks volume to the sustainability of the system. People don’t need a sell on the access to the data, they are instead interested in how resellers can make that data useful to them.

Twitter Certified Partner Program

Zach defines the Twitter Certified Partner Program as the answer to skeptics that Twitter doesn’t like their ecosystem. The program was established to help the ecosystem grow, help them succeed and grant providers their seal of approval.

The program ultimately acts as a tool to empower innovation on the Twitter stream. Twitter does not have the capacity to create these tools and resources independently. Less than a year old, the program has been adding 5 to 10 strategic companies each quarter. Factors when selecting certified partners include innovative uses of the data (beyond analytics and engagement) and strategic international partnerships.

Certified partners benefit from instant credibility provided through membership in the program when talking to investors and customers, access to prioritized developer support and promotion from the Twitter sales team. Twitter sales team members are trained and knowledgeable of certified partner products. As the team sells promoted content, they are also able to suggest and recommend partners to fill needs Twitter cannot.

International Growth: Not Just Language Localization

Conway identifies two areas of growth that are current bright spots: Europe and Japan. In identifying new markets, Twitter is looking for existing ecosystems where then can bolster and support what’s already happening. Brazil, Japan, South Korea and India are four regions appealing to Twitter now.

Localization isn’t just localization in terms of language, there is localization of analytics and data types as well.

International tools looking to join the Twitter Certified Partner Program need to match the same high standards of other partners. Twitter works with products in new markets to bring them to their standards.


Conway calls for service providers to develop tools to empower advertisers to move to ROI driven decisions. He encourages developers to focus on tools to provide actionable insights to inform ad-spend.

The Future of Twitter Data

In a word: Media. In the last year Twitter has blossomed beyond the 140 to the media hung off those characters. Innovation in the data will include tapping into what is attached to the Tweet. Not just the Tweet itself.


Self-defining as a mobile-first company, Conway identifies explaining why geodata remains so low as one of his biggest pain points. The balance to respect user’s privacy first while acknowledging delivering a better consumer experience depends on the inclusion of geodata. Ultimately, Conway categorizes it as a product side problem: to get users to opt-in to share their data.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Social Data in Academic Research

Sherry Emery, Abe Kazemzadeh and Jaime Settle discuss the role of social data in their academic research. For more background on this topic, check out our blog interviews with Sherry Emery and Jaime Settle

Sherry Emery, Abe Kazemzadeh, Jaime Settle, Paul Smalera at Big Boulder

The volume of social data being captured is just begging to be studied, but funding and grant issues, lack of standardized protocols about research projects using this fluid (and sometimes deletable) data set, lack of curricula for data science and social science purposes, and different timelines facing academics and the companies collecting the data are some of the problems currently facing academia. Research labs hope to bridge these gaps through partnerships with social data companies.

Smokin’ Cigarettes, Smokin’ BBQ, Smokin’ Hot Girls
Capturing the whole conversation about smoking on a social media channel for a research project proved difficult, Sherry says. One of the challenges is selecting the right keywords. But as she learned, in a project, there were approximately 70 million Tweets that contained the keyword “smoking.” But as the team used a software to categorize the large volume of Tweets, they learned only a third were about smoking tobacco, her topic of study. Other categories included smoking marijuana to smoking ribs to “smoking hot girls.” Studying the smoking Tweets also revealed an interesting sentiment: people who tweeted about smoking tobacco cigarettes felt ashamed while those who tweeted about smoking pot felt proud.

How Academics Use Social Data vs How Companies Use Social Data
Academics have the luxury of time studying social data, but not the luxury of a time machine. Researchers are chasing a more historical perspective of the data, but unless they are aware of and can anticipate the keywords and events that matter, their pursuit could be snuffed out. It’s a double-edge sword. Social data is streaming, which means academics can’t and don’t always anticipate the necessary keywords to pull in data early enough to fully capture an event and behaviors they want to study. For example: Sherry’s team serendipitously captured Tweets about a proposed ballot in California to increase the price of cigarettes, but the majority of Tweets didn’t contain the usual smoking keywords of “cigarettes” and “smoking” and “tobacco.” By the time the researchers realized this, they had already missed a large portion of the social data. Popular opinion, Sherry says, changed dramatically between the three months before the vote and in the voting period.

Social Media Companies + Academia = Match Made in Data Nerd Heaven
Jaime says that because the nature of social data and its tools are forward looking, they are not designed to get data retroactively or historically. Perhaps this is an opportunity for academia and social media companies to partner and rely on each other as resources. There is a need for curricula for university students so they can be employed at social companies as well as become social scientists, and social companies can influence what needs to be taught in such curricula in higher ed. This is a gap, and partnerships need to be forged between these two groups in order for the full potential for social data to be explored as the demand to understand it grows. Social companies have their own set of data and social data teams that are internal to their needs and goals to success as business. Academics see potential and overlap in goals in the very same data that these companies are collecting about users which could reveal insights to human behaviors.

Abe explains the pros and cons doing research for companies (but says the benefits outweigh the cons).

Variety of funding from government grants
Interesting problems companies are facing
College students get an opportunity to work on cool research projects for real world problems
Sense of urgency
Funding is on a subscription format; if company has a bad year, they cancel their subscription to research lab services

Looking Ahead
Funding seemed to be an overall challenge to academics looking to study social data. Challenges also include the ethical implications of using such a fluid data set on subjects who may not understand they are being studied. There needs to be a standardized protocol of the study, reporting and managing of social data — respecting the data and the subject being researched. The current situation is vulnerable to possible scandal in the case of an invasion of privacy or abuse of data. Institutional review boards need to begin to have a dialogue with researchers (and social media companies?) about best practices for this new niche of research before an egregious case occurs.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Mining Consumer Opinion in Comments

An interview with Daniel Ha and Steve Roy from Disqus on mining opinion in comments. 

Commonly known as a comment system, Disqus facilitates comments from over 2.5 million sites. The team at Disqus, Daniel Ha and Steve Roy, like to think of themselves as a community of other communities. But how do they distinguish themselves?

 Communities and Identity

Any discussion that happens on Disqus, by its nature is its own community. Disqus found that the majority of users’ time was spent below the fold, in the comments. Part of what fuels this is the ability to act under a pseudonym. Disqus maintains that by embracing a pseudonym, people can act as their “real” self. They find that people who embrace a pseudonym reveal a more passionate interest than they normally would. It gives people a voice they wouldn’t typically be able to use, enabling a user to pursue things that mainstream media may not be covering, or to be part of a community they couldn’t otherwise.


Brands can tap into Disqus in a couple ways:

  1. On their properties utilizing Disqus: Brands like HP have launched destination websites with Disqus to participate in the conversation naturally happening.
  2. Disqus’ ad product: Brands can pay to have a presence in other websites (like a Tumblr blog) and place their content above the comment feed. The response to this placement of content is higher as well because it’s located where the audience is more engaged.
  3. Learned Insights: Brands can use pattern detection to learn stories about their brands. A great example of this is when there needs to be a product recall, because a lot of this type of discussion takes place in these stories.

Data Learnings

Disqus recently achieved a major milestone, reaching 1 billion monthly unique visitors. Often considered US focused, the majority of their growth in recent months is international. Disqus supports 40+ languages worldwide. Through its many users, Disqus has been able to understand the behavior patterns on their networks and noted 3 things in particular:

  •  Comment Length: The amount of characters can tell a lot about the level of interest in users. Steve says 57% of all comments are essentially the lengths of Tweets (under 140 characters) and not using links.
  • Time of Day: The worldwide pattern for commenting shows a peak in volume at 10 am in every time zone. Not only does this mean more people comment at this time of day, they also engage with other comments and read comments then too.
  • Categories: Disqus buckets their sites into about 45 different types. Each category has various statistics associated with their category as well. For instance, gamer sites average about 10 characters per comment. Religious sites, on the other hand, average closer to 600 characters per comment. As a brand, this is valuable data that can help shape how they engage with users.

Disqus is proud of the use cases of their data too. Several examples were mentioned, like Gooqus, a search engine utilizing both Google custom search and Disqus.This allows a user to not only see the top Google results, but also add a layer of richness, allowing for more sentiment to be derived from the data.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.