Guide to the Twitter API – Part 3 of 3: An Overview of Twitter’s Streaming API

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

Announcing Multiple Connections for Premium Twitter Feeds

A frequent request from our customers has been the ability to open multiple connections to Premium Twitter Feeds on their Gnip data collectors. Our customers have asked and we have delivered!

While multiple connections to standard data feeds have been available for quite some time, we have only allowed one connection to our Premium Twitter Feeds.  Beginning today you will be able to open multiple mirrored connections to Power Track, Decahose, Halfhose, and all of our other Premium Twitter Feeds.  This feature will be helpful when testing connections to your Gnip data collector in different environments (such as staging or review) without having an impact on your production connection.

You may be saying “Sounds great Gnip, but will I be charged the standard Twitter licensing fee for the same tweet delivered across multiple connections?”. The answer is no!  You will pay a small flat fee per month for each additional connection.  If you’re interested in adding Multiple Connections to your Premium Twitter Feed please Contact Us.

New Gnip & Twitter Partnership

We at Gnip have been waiting a long time to write the following sentence: Gnip and Twitter have partnered to make Twitter data commercially available through Gnip’s Social Media API. I remember consuming the full firehose back in 2008 over XMPP. Twitter was breaking ground in realtime social streams at a then mind-blowing ~6 (six) Tweets per second. Today we see many more Tweets and a greater need for commercial access to higher volumes of Twitter data.

There’s enormous corporate demand for better monitoring and analytics tools, which help companies listen to their customers on Twitter and understand conversations about their brands and products. Twitter has partnered with Gnip to sublicense access to public Tweets, which is great news for developers interested in analyzing large amounts of this data. This partnership opens the door to developers who want to use Twitter streams to create monitoring and analytics tools for the non-display market.

Today, Gnip is announcing three new Twitter feeds with more on the way:

  • Twitter Halfhose. This volume-based feed is comprised of 50% of the full firehose.
  • Twitter Mentionhose. This coverage-based feed provides the realtime stream of all Tweets that mention a user, including @replies and retweets. We expect this to be very interesting to businesses studying the conversational graph on Twitter to determine influencers, engagement, and trending content.
  • Twitter Decahose. This volume-based product is comprised of 10% of the full firehose. Starting today, developers who want to access this sample rate will access it via Gnip instead of Twitter. Twitter will also begin to transition non-display developers with existing Twitter Gardenhose access over to Gnip.

We are excited about how this partnership will make realtime social media analysis more accessible, reliable, and sustainable for businesses everywhere.

To learn more about these premium Twitter products, visit http://gnip.com/twitter, send us an email at info@gnip.com, or appropriately, find us on Twitter @gnip.