Gnip has hit another big milestone — we’re now delivering 100 billion social data activities each month. In comparison, we were delivering 30 billion social data activities back in November. We’ve more than tripled the data delivered in a handful of short months.
What is the cause for all of this growth? Three reasons:
Enterprise providers continue to rapidly adopt social data into their offerings. As such, our growth rate for new customers continues to accelerate.
Companies are expanding their insight and analysis offerings over a broader spectrum of social conversation. We’ve added three premium full firehoses of data this year including Tumblr, WordPress and Disqus, as well as other sought after data sources such as Sina Weibo.
The number of supported use cases for social data continues to expand beyond traditional brand monitoring. We see the use cases for social data evolving all of the time and have seen a substantial uptick in social data being used in finance and business intelligence specifically.
Big Boulder is next week and we’re excited to add four new speakers who are using social data in amazing ways, from disaster response and epidemic tracking to predicting the stock market and monitoring political developments.
Katie Baucom, Geospatial Analyst at National Geospatial-Intelligence Agency
Big Boulder is two weeks away and everything is really coming together beautifully. The world’s first conference on social data already has top-notch speakers such as Ryan Sarver of Twitter, Joe Fernandez of Klout and Sean Bruich of Facebook. But…we’re not done yet! Today we’re excited to announce ten new speakers who are leading the world in social data innovation.
We’re excited to announce:
Yael Garten, Senior Data Scientist, Team Lead for Mobile Data Analytics at LinkedIn
It is always nice to get recognition in your own backyard and Gnip is excited and humbled to be named a Top 50 Colorado Company to Watch. The past four years have been an amazing journey and while we’re particularly excited by all we’ve recently accomplished, we’re even more excited for what’s ahead. We believe social data has unlimited potential and we are excited to be driving the adoption of this data across the world from our home in Boulder, Colorado.
Big Boulder is just over a month away, and we’re excited to announce seven incredible new speakers to the Big Boulder agenda. When we started planning the first social data conference, we wanted to put together a world class speaker list. We’ve been thrilled by the response and are excited to add speakers from companies such as Tumblr and Get Satisfaction. We’re also working on some really interesting panels so keep your eye out for more to come!
I’m thrilled to announce that the full firehose of public Tumblr posts is now available exclusively from Gnip. Tumblr is one of the fastest growing social networks in the world. Much of this growth is fueled by the enormous number of conversations that are unique to the Tumblr community. These conversations cover a huge range of subjects, from movies, TV shows and fashion to business, apparel and consumer products. Check out these stats to get a feel for the volume of discussion on Tumblr:
50 million new posts every day
15 billion page views every month
20 billion total posts
300% traffic growth last year
While some social platforms react quickly to news and other events, Tumblr conversations often spread around concepts and trends. Take the example of Urban Outfitters where a photographer posted a picture to her personal Tumblr of a piece from one of their new collections. That post received over 1,000 notes and almost no mention elsewhere. In the case of Land Rover, the company posted a picture of a dog riding in a Land Rover to their Tumblr that received more than 5,000 notes and very little mention on other networks.
It doesn’t take a large leap to see the impact this type of information can have on brand management and product development. The conversations on Tumblr are rich in images and discussion about brands and products, from simply sharing a picture about a favorite pair of shoes to reblogging news about favorite brand. And given the highly social nature of the Tumblr community, these discussions move quickly and broadly through the community. You often see posts that are shared tens of thousands of times. For brands, every conversation matters and access to the full firehose ensures they won’t miss a thing.
We’re excited to be able to offer Tumblr to our customers and can’t wait to see what other intriguing use cases they find for this data.
Our customers tell us that getting every single Tweet that matters is one of the key reasons they work with Gnip. And sometimes getting every Tweet that matters means filtering out the Tweets you don’t want. With this in mind, I’m happy to announce the introduction of two new operators to our Power Track filtering suite.
The Retweet operator allows a customer to ensure only Retweets that match a rule are delivered or excluded.
To use the Retweet operator, simply add is:retweet or –is:retweet to any rule.
Receive only Retweets mentioning Apple using a rule like: apple is:retweet as a way to measure engagement of the brand’s fan base
Get only Tweets with unique content about Apple using a rule like: apple -is:retweet to monitor conversation about the brand and ignore the tremendous volume of retweets generated by the brand
The Sampling operator allows a customer to receive a random sample of Tweets that match a rule rather than the entire set of Tweets.
There are several use cases where the Sampling operator is useful. Say you want to stay within a budgeted number of Tweets each month, but you’re trending higher than that budget halfway through the month. With the Sampling operator, you can scale back your consumption without fully eliminating rules. In another use case you might want to monitor a very high-volume rule or user, but your internal systems can’t handle this volume. Sampling makes this more manageable. Finally, there are times when you simply need to know the directional volumes for things, and don’t need every tweet.
To use the Sampling operator, add sample:## to any rule with an integer value between 1 to 100. The Sampling operator applies to the entire rule and requires any “OR’d” terms be grouped.
Receive a sampling of 10% of all Tweets that contain “apple” using a rule like:
Receive a sampling of 50% of all Tweets that contain “iPad” or “iPhone” using a rule like:
(ipad OR iphone) sample:50
As always, thank you for the product feedback and keep it coming. Additional documentation of these new operators and others can be found in our online documentation.
Gnip is always looking for ways to improve its filtering capabilities and customer feedback plays a huge role in these efforts. We are excited to announce enhancements to our PowerTrack product that allow for more precise filtering of the Twitter Firehose, a feature enhancement request that came directly from you, our customers.
Gnip PowerTrack rules now support OR and Grouping using (). We have also loosened limitations on the number of characters and the number of clauses per rule. Specifically, a single rule can now include up to 10 positive clauses and up to 50 negative clauses (previously 10 total clauses). Additionally, the character limit per rule has grown from 255 characters to 1024.
With these changes, we are now able to offer our customers a much more robust and precise filtering language to ensure you receive the Tweets that matter most to you and your business. However, these improvements bring their own set of specific constraints that are important to be aware of. Examples and details on these limitations are as follows:
OR and Grouping Examples
apple OR microsoft
apple (iphone OR ipad)
apple computer –(fruit OR green)
(apple OR mac) (computer OR monitor) new –fruit
(apple OR android) (ipad OR tablet) –(fruit green microsoft)
A single rule may contain up to 1024 characters including operators and spaces.
A single rule must contain at least 1 positive clause
A single rule supports a max of 10 positive clauses throughout the rule
A single rule supports max of 50 negative clauses throughout the rule
Negated ORs are not allowed. The following are examples of invalid rules:
-iphone OR ipad
ipad OR -(iphone OR ipod)
An implied “AND” takes precedence in rule evaluation over an OR
For example a rule of:
android OR iphone ipad would be evaluated as apple OR (iphone ipad)
ipad iphone OR android would be evaluated as (iphone ipad) OR android
At Gnip, one of the most fascinating aspects of social media is ‘speed’ – specifically in regards to news stories. We continue to see a trend towards the ‘breaking’ of news stories on platforms like Twitter. Both the speed at which a story is broken as well as the speed at which that story catches on show the incredible power of this medium for information exchange. And as we’ve pointed out before, different social media streams offer different analytical value – Twitter versus a news feed for example.
Last night proved a great example of this as word of Jon Huntsman’s withdrawal from the GOP presidential race crept out. Interestingly, the news was broken by Peter Hamby, a CNN Political Reporter–on Twitter. While CNN followed up on this news a few minutes later, it seems the reporter (or the network) realized the inherent ‘newswire’ value of breaking this news as fast as possible…and used Twitter as part of their strategy to do so!
This Tweet was followed with what we’ve begun to see as the normal ‘Twitter’ spike for breaking news – the chart below, built by our Data Scientist Scott, shows how quickly Huntsman withdrawl was retweeted and passed along. When looked at in comparison to an aggregate news feed (in this case, NewsGator’s Datawire Firehose, which is a content aggregator derived from crowdsourced rss feeds and contains many articles from traditional media providers), some interesting comparisons are brought to light.
Comparing tweets of “huntsman” and news articles breaking Jon Huntsman’s withdrawal from GOP primary race. The blue curves show the “Social Activity Pulse” that characterizes the growth and decay of media activity around this topic. By fitting the rate of articles or tweets to a function we can compare standard measure such as time-to-peak, store half-life etc. (More on this in a future post.) The peak in Twitter is reached about the same time as the first story arrives from NewsGator, over 10 minutes after the story broke on Twitter.
Both streams show a similar curve in story adoption, peak and tail. What’s different is the timeframe of the content. Twitter’s data spikes about 10 minutes earlier than NewsGator’s. NewsGator’s content is more in-depth, as it contains news stories and blog posts, but as we’ve seen in other cases, Twitter is the place where news breaks these days.
I’m excited to announce that, as of the end of October, Gnip is delivering over 30 billion paid social media activities per month to our customers. This is the largest number of paid social media activities that have ever been distributed in a 30 day period.Over the past year, we’ve seen extraordinary growth in the number of paid social media activities we deliver. At the start of 2011, Gnip was delivering 300 million activities per month. By May, that number was up to 3 billion activities per month. And in October, we delivered 30 billion activities. In essence, we’ve been growing by a factor of 10 every 5 months. At this rate, we’ll be delivering 300 billion activities per month by March of next year
Cool numbers, but what’s driving this growth?
We’re seeing three key areas that are driving this number. First, we’re signing on new customers at an increasing rate, as more and more companies are seeing the possibilities in social media data. Second, we’re seeing increased interest in our Twitter firehose products. From hedge funds using social data to drive trading strategies to business intelligence companies layering social data onto their existing structured data sources, interest in volume products from Twitter is consistently increasing. And finally, we’re seeing a marked increase in the number of customers using multiple sources to enrich their product capabilities. From boards and forums to YouTube and Facebook, our customers are seeing the potential in the many other social data we offer.
So, 300 billion per month by March? It’s a big number, but the way things are going, I’ll take the over.