Expion Joins Plugged In to Gnip and Adds Tumblr & GetGlue as Data Sources

This week I’m on a panel at Expion’s third annual “Mission Possible” Conference in Raleigh, which makes it even more fun to announce that Expion is now a Plugged In to Gnip partner and will be adding Tumblr, Disqus and GetGlue to their social data sources. It is especially significant to be making this announcement among all of Expion’s incredible customers because, ultimately, this partnership is all about providing them with the best social data out there.

Expion’s leadership position in the social media marketing and engagement industry makes them an ideal Plugged In partner as they’re committed to providing complete, reliable and sustainable social data into their analytics products. The world’s largest brands and agencies use Expion to effectively monitor and engage with their customers in real time across multiple geographic locations and myriad digital channels. The marketplace is changing rapidly and we are seeing industry leaders like Expion marshall together the best mix of social data sources to serve their customers. By adding sources such as GetGlue, Tumblr and Disqus, Expion is creating a competitive advantage for their customers–giving them a much more complete picture for their brand.

I’ve had the pleasure of working with Expion for the past year and have been particularly impressed by their commitment to innovation. The social media landscape is constantly evolving and they have always been eager to dive in to the latest products and data sources such as Tumblr and GetGlue to ensure that their products, and their customers stay ahead of the curve.

Stay tuned for a case study demonstrating what Expion’s customers are doing with access to these data sources!

Gnip at Expion Conference

 

Gnip is Now The Provider of GetGlue Social Data

During Big Boulder this year, Maya Harris from GetGlue talked about the community on GetGlue and how they banded together to keep the TV show Nikita on the air when it was on the verge of being cancelled. Maya also showed the strong correlation between the volume of GetGlue check-ins and Nielsen ratings for a show like “The Walking Dead.” With examples like these, we’re incredibly excited about the insights our customers will develop now that they have access to the full firehose of check-ins and comments from GetGlue.

GetGlue is a recognized leader in second screen engagement and social television. It’s a community of more than 4 million people bantering about their favorite shows and movies, the cliffhangers and the surprise endings. With GetGlue, users can use their phones and computers to check-in, like, comment and engage with other fans around the TV shows, movies and sports that they love. More than 75 major television networks and 25 movie studios use GetGlue to promote their shows and movies and engage with their fans.

The possibilities we see for analysis of this data are immense. Looking for a realtime measure of a TV show’s popularity? GetGlue check-ins are closely correlated to Nielsen ratings.

GetGlue Walking Dead Infographic

Want to get an early measure on the box office success of a big movie release? You can use the check-ins on GetGlue to get a realtime measure of the most popular movies on any given weekend. Trying to figure out it fans of Game of Thrones are also fans of The Walking Dead? Now you can.

Check out gnip.com/getglue to learn more or shoot us an email at info@gnip.com if you’d like to get in touch.

The Big Boulder Initiative Launches To Drive The Social Data Industry

Chris Moody, COO of Gnip, kicks off the conference with a talk about the social data ecosystem and introduces the Big Boulder Initiative. 

Chris Moody of Gnip

At the beginning of of Big Boulder, Gnip COO Chris Moody, launched the Big Boulder Initiative — a way for industry leaders “to establish the foundation for the long-term success of the social data industry.”

As Chris noted, Big Boulder isn’t just a group of people coming together but the leaders of social data, one of the most important industries to develop in the last two decades. The attendees consume over 4 billion social data activities every day that ultimately serve 95% of the Fortune 500.

Chris asked attendees for the next two days to leave their titles behind and instead to think of themselves as the leaders of the social data industry. And as leaders they all need to think about where social data is going and how to address our collective challenges/

Big Boulder is only two days out of the year, so the Big Boulder initiative will allow the leaders of the industry to come together multiple times a year.

Chris outlined the five questions the industry is facing, and wants to hear what others are thinking as well. The issues were the following:

  • Return – How do we remove any remaining doubt about the value of social data? There are open-ended questions around consistency, bias and measurement that need to be addressed.

  • Trust – How do we build trust and understanding with the most important people in social media, the content creators?

  • Access – We’re only able to analyze a fraction of the public social data that’s out there. How can we get access to more data to improve our products?

  • Sustainability – How do we convince the world to make long-term investments in an industry that is so new?

  • Costs – How do we manage the growing costs to store, index and serve ever-increasing volumes of data?

For those that weren’t able to attend Big Boulder but want to be involved in shaping industry issues, they can learn more at bigboulderinitiative.com.

Big Boulder is the world’s first social data conference. Follow along at #BigBoulder, on the blog under Big BoulderBig Boulder on Storify and on Gnip’s Facebook page.

Enhanced Filtering for Power Track

Gnip is always looking for ways to improve its filtering capabilities and customer feedback plays a huge role in these efforts.  We are excited to announce enhancements to our PowerTrack product that allow for more precise filtering of the Twitter Firehose, a feature enhancement request that came directly from you, our customers.

Gnip PowerTrack rules now support OR and Grouping using ().  We have also loosened limitations on the number of characters and the number of clauses per rule. Specifically, a single rule can now include up to 10 positive clauses and up to 50 negative clauses (previously 10 total clauses).  Additionally, the character limit per rule has grown from 255 characters to 1024.

With these changes, we are now able to offer our customers a much more robust and precise filtering language to ensure you receive the Tweets that matter most to you and your business.  However, these improvements bring their own set of specific constraints that are important to be aware of.  Examples and details on these limitations are as follows:

OR and Grouping Examples

  • apple OR microsoft
  • apple (iphone OR ipad)
  • apple computer –(fruit OR green)
  • (apple OR mac) (computer OR monitor) new –fruit
  • (apple OR android) (ipad OR tablet) –(fruit green microsoft)

Character Limitations

  • A single rule may contain up to 1024 characters including operators and spaces.

Limitations

  • A single rule must contain at least 1 positive clause
  • A single rule supports a max of 10 positive clauses throughout the rule
  • A single rule supports max of 50 negative clauses throughout the rule
  • Negated ORs are not allowed. The following are examples of invalid rules:
  • -iphone OR ipad
  • ipad OR -(iphone OR ipod)

Precedence

  • An implied “AND” takes precedence in rule evaluation over an OR

For example a rule of:

  • android OR iphone ipad  would be evaluated as apple OR (iphone ipad)
  • ipad iphone OR android would be evaluated as (iphone ipad) OR android

You can find full details of the Gnip Power Track filtering changes in our online documentation.

Know of another way we can improve our filtering to meet your needs?  Let us know in the comments below.

Gnip and Automattic Make Whole New Universe of Data Available

“This new data from Automattic is a big addition and a testament to Gnip’s commitment to drive the social data economy forward. This is an important source to add to the social data mix, one that we know our customers will take full advantage of.”

- Rob Begg, VP Marketing of Radian6

As social media data becomes more and more important across a range of businesses, our customers are asking for access to more data sources to give them a more complete picture of the social media conversations that are relevant to their businesses.

Today, we’re excited to announce a major addition to our coverage of the conversations taking place on blogs around the world. We’re expanding our relationship with Automattic to make a whole new universe of blog and comment data available to the market for the first time anywhere.

For those who don’t know, Automattic is a network of web services including WordPress.com, VIP hosting and support, Polldaddy, IntenseDebate, and Jetpack. We’ve been delivering data from WordPress.com and IntenseDebate for about a year and a half and found that while our customers loved their data, they always wanted more.

As of today, we are now offering the full firehose of blog posts and comments from Jetpack-powered WordPress.org sites, as well as engagement streams of “likes” from WordPress.com and IntenseDebate. The new data from WordPress.org greatly increases the coverage available to those who are looking to do deep analysis of blog posts and comments. The new engagement streams enable companies to pull in reaction data to quickly understand sentiment, relevance and resonance. With this they can gauge the intensity of opinion around fast moving blog and comment conversations, helping prioritize critical response.

Being full firehoses, all of the streams from Automattic ensure 100% coverage in realtime giving customers the peace of mind that they can keep up the entire discussion on fast moving threads.

The scope of coverage offered by Automattic is pretty incredible.  Check out some of these stats:

We’re thrilled to be able to offer these new data streams to our customers and can’t wait to see the amazing things they’ll be able to do with them.

Updated: Coverage in GigaOM – Gnip and WordPress deepen ties, expand data partnership

Twitter Shouts: Huntsman's Out!

At Gnip, one of the most fascinating aspects of social media is ‘speed’ – specifically in regards to news stories. We continue to see a trend towards the ‘breaking’ of news stories on platforms like Twitter. Both the speed at which a story is broken as well as the speed at which that story catches on show the incredible power of this medium for information exchange. And as we’ve pointed out before, different social media streams offer different analytical value – Twitter versus a news feed for example.

Last night proved a great example of this as word of  Jon Huntsman’s withdrawal from the GOP presidential race crept out. Interestingly, the news was broken by Peter Hamby, a CNN Political Reporter–on Twitter. While CNN followed up on this news a few minutes later, it seems the reporter (or the network) realized the inherent ‘newswire’ value of breaking this news as fast as possible…and used Twitter as part of their strategy to do so!

This Tweet was followed with what we’ve begun to see as the normal ‘Twitter’ spike for breaking news – the chart below, built by our Data Scientist Scott, shows how quickly Huntsman withdrawl was retweeted and passed along. When looked at in comparison to an aggregate news feed (in this case, NewsGator’s Datawire Firehose, which is a content aggregator derived from crowdsourced rss feeds and contains many articles from traditional media providers), some interesting comparisons are brought to light.
Comparing the pulse of Twitter and NewsGator articles breaking Huntsman's withdrawal from the GOP primary race.
Comparing tweets of “huntsman” and news articles breaking Jon Huntsman’s withdrawal from GOP primary race. The blue curves show the “Social Activity Pulse” that characterizes the growth and decay of media activity around this topic. By fitting the rate of articles or tweets to a function we can compare standard measure such as time-to-peak, store half-life etc. (More on this in a future post.) The peak in Twitter is reached about the same time as the first story arrives from NewsGator, over 10 minutes after the story broke on Twitter.

Both streams show a similar curve in story adoption, peak and tail. What’s different is the timeframe of the content. Twitter’s data spikes about 10 minutes earlier than NewsGator’s. NewsGator’s content is more in-depth, as it contains news stories and blog posts, but as we’ve seen in other cases, Twitter is the place where news breaks these days.

 

Boulder Chamber of Commerce: Why Gnip Joined

It took awhile, but Gnip’s now a Boulder Chamber of Commerce (@boulderchamber) Member. We joined after a pattern of clear value to our particular industry became clear. In August of this year they hosted an event on that put us face-to-face with a the U.S. Department of Commerce Under Secretary for International Trade (Francisco Sánchez) and Colorado Congressman (Jared Polis) where we discussed software patent issues, as well as immigration visa challenges the U.S. tech industry faces. Tonight I’m attending an event with Congressman Polis and a local software Venture Capitalist (Jason Mendelson) to talk about challenges surrounding the hiring of technical talent locally, and globally.

These are topics with significant political/legislative dynamics, and the Chamber has given us, a local software firm, access to relevant forums in which we can get our point of view on the table; thank you.

Whether or not the Chamber has been providing this kind of relevant access all along, I don’t know (my perception is otherwise). I do know that the impact they’re having on us as a local software business, as well as the channel they’re giving Gnip to get its perspective heard in the broader (National) forum, is significant. I’d encourage other Boulder software/technology firms to support their efforts, contribute in their events, and help them build an agenda that in the end, helps us be more effective software/technical businesses.

Join us, in joining the Chamber.

Simplicity Wins

It seems like every once in a while we all have to re-learn certain lessons.

As part of our daily processing, Gnip stores many terabytes of data in millions of keys on Amazon’s S3. Various aspects of serving our customers require that we pour over those keys and the data behind them, regularly.

As an example, every 24 hours we construct usage reports that provide visibility into how our customers are using our service. Are they consuming a lot or a little volume? Did their usage profile change? Are they not using us at all? So on and so on. We also have what we affectionately refer to as the “dude where’s my tweet” challenge; of the billion activities we deliver each day to our customers, inevitably someone says “hey, I didn’t receive Tweet ‘X’ what gives?” Answering that question requires that we store the ID of every Tweet a customer ever receives. Pouring over all this data every 24 hours is a challenge.

As we started on the project, it seemed like a good fit for Hadoop. It involves pulling in lots of small-ish files, doing some slicing, aggregate the results, and spitting them out the other end. Because we’re hosted in Amazon it was natural to use their Elastic MapReduce service (EMR).

Conceptually the code was straight forward and easy to understand. The logic fit the MapReduce programming model well. It requires a lot of text processing and sorts well into various stages and buckets. It was up and running quickly.

As the size of the input grew it started to have various problems, many of which came down to configuration. Hadoop options, JVM options, open file limits, number and size of instances, number of reducers, etc. We went through various rounds of tweaking settings and throwing more machines in the cluster, and it would run well for a while longer.

But it still occasionally had problems. Plus there was that nagging feeling that it just shouldn’t take this much processing power to do the work. Operational costs started to pop up on the radar.

So we did a small test to check the feasibility of getting all the necessary files from S3 onto a single EC2 instance and processing it with standard old *nix tools. After promising results we decided to pull it out of EMR. It took several days to re-write, but we’ve now got a simple Ruby script using various *nix goodies like cut, sort, grep and their friends. The script is parallel-ized via JRuby threads at various points that make sense (downloading multiple files at once and processing the files independently once they’ve been bucketed).

In the end it runs in less time than it did on EMR, on a single modest instance, is much simpler to debug and maintain, and costs far less money to run.

We landed in a somewhat counter-intuitive place. There’s great technology available these days to process large amounts of data; we continue to use Hadoop for other projects. But as we start to bring them into our tool-set we have to be careful not to forget the power of straight forward, traditional tools.

Simplicity wins.

Delivering 30 Billion Social Media Activities Monthly . . . and Counting

I’m excited to announce that, as of the end of October, Gnip is delivering over 30 billion paid social media activities per month to our customers. This is the largest number of paid social media activities that have ever been distributed in a 30 day period.Over the past year, we’ve seen extraordinary growth in the number of paid social media activities we deliver. At the start of 2011, Gnip was delivering 300 million activities per month.  By May, that number was up to 3 billion activities per month.  And in October, we delivered 30 billion activities.  In essence, we’ve been growing by a factor of 10 every 5 months.  At this rate, we’ll be delivering 300 billion activities per month by March of next year

Cool numbers, but what’s driving this growth?

We’re seeing three key areas that are driving this number. First, we’re signing on new customers at an increasing rate, as more and more companies are seeing the possibilities in social media data. Second, we’re seeing increased interest in our Twitter firehose products. From hedge funds using social data to drive trading strategies to business intelligence companies layering social data onto their existing structured data sources, interest in volume products from Twitter is consistently increasing.  And finally, we’re seeing a marked increase in the number of customers using multiple sources to enrich their product capabilities.  From boards and forums to YouTube and Facebook, our customers are seeing the potential in the many other social data we offer.

So, 300 billion per month by March? It’s a big number, but the way things are going, I’ll take the over.

Why Traders Use Social Media: Speed & Amplification

Gnip’s asset and investment management clients are consistently impressed by two aspects of our social data that differentiate this data from their other sources: Speed & Amplification.

Speed

Speed relates to the ability of social media content to be ‘instant’; an ability fueled by millions of global users who can break news and sentiment more immediately than traditional media sources always can.

A prime example is news of the death of Osama Bin Laden. Keith Urbahn, the former chief of staff for Don Rumsefeld, is widely credited with the breaking that story… through Twitter!

After Keith’s tweet, multiple retweets quickly followed. Within 19 tweets on this subject, a company called DataMinr had identified this as an important and breaking story. DataMinr, a “global sensor network for emerging events and consumer signals,” then issued a signal to their clients, alerting them to this important piece of information.

How does this play into the ‘speed’ characteristic? Because it would be over 20 minutes before that story appeared on traditional news sites. Access to a data stream that can beat traditional media sources by over 20 minutes requires no explanation as to its value for traders and investors.

Amplification

Amplification speaks to the ability of social media as a ‘crowd-sourced megaphone.’ The propensity of users to like, share, and retweet content from other users gives those consuming social media data an extremely easy mechanism to measure what content is most important to the world – and compare that content against other content in real time.

A prime example is the passing of Steve Jobs. We wrote about Steve Jobs’ passing a few weeks ago – that post is here – but there’s an important item to revisit:

The impact he had on us made his death that much more profound and the reaction on Twitter was immediate and immense. Word spread rapidly, peaking at 50,000 Tweets per minute within 30 minutes. At that point, Tweets about Jobs accounted for almost 25% of all Tweets being sent globally.

Access to Gnip’s social media data stream allowed our clients to measure, in the moment, the amplification of this story to measure the importance the world placed on this piece of news. While I doubt any of us needed to see those numbers to know Steve’s passing was an important piece of news, that’s a clear example of how ‘amplification’ works.

Our clients use amplification as a measure to weigh the importance of breaking news, upcoming events, market and product announcements, etc. against other stories. By capturing a realtime snapshot of what the market considers important – and what it doesn’t – they’re able to add an important factor to their existing algorithms.

None of this is to suggest that either social media data speed or amplification should be a sole factor in investing. But when the Gnip social media data stream provides clients with an additional factor to help understand or predict market fluctuations, the value is obvious.