Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Links & The Twitter Firehose

One of the more interesting components of Twitter streams are the links within the Tweets themselves. Not only are links one way to bridge from traditional web trend analysis, to social media, but they are also a window into what people are sharing.

Gnip provides three mechanisms to get at links in Tweets.

  • Link Stream. The link stream provides you with 100% of the Tweets that contain links. Furthermore, Gnip enriches the stream with unwound URLs, so you don’t have to bother with an unwind-farm on your end.
  • Power Track’s ‘has:links’ operator. Through Power Track, you can refine your complex queries (including substring matching) to collect only Tweets that contain links.
  • Power Track’s ‘url_contains:’ operator. The ‘url_contains:’ operator allows you to filter the 100% Firehose for Tweets that have links and contain the substring you provide. It filters against both short, and long, URLs.

Happy filtering!

Geo-coded Tweet Streams

Monday’s deploy brought the ‘has:geo’ operator to Gnip’s Twitter Power Track. ‘has:geo’ gives you access to geo-coded Tweets (any Tweet with lat/lng coordinates). Geo-coded Tweets have been one of the most demanded streams/substreams to-date. We’re really excited to bring this feature to light.

Some usage examples:

  • “has:geo”- alone, gives you the complete stream of all geo-coded Tweets
  • “coffee has:geo” – gives you the complete stream of all geo-coded Tweets that contain the word “coffee”
  • “fire has:geo” – gives you the complete stream of all geo-coded Tweets that contain the word “fire”

For a complete listing of Power Track operators see the documentation. As with all Commercial Twitter data products brought to you by Gnip, they are only for use in non-public-display and non-programmatic resyndication use cases. If you want to do at-scale, full-coverage analysis of Twitter streams, we’re here to help. Contact us at info@gnip.com for more info.

New Power Track Features

Gnip’s Twitter Power Track feed has been a raging success! One of the fun things about Power Track is its expandability. We’ve been adding features left and right over the past few weeks to ensure you’re getting the Tweet filtering precision you need, across the 100% Twitter Firehose, with no volume limits.

As of today’s deploy, we’ve added support for the following new features:

General

stream compression (optionally set via typical “Accept-Encoding: gzip” client header). At volume, bandwidth costs are very real, not to mention the operational challenges of dealing with fat pipes. By enabling compression you can dramatically reduce your bandwidth costs and potentially avoid expensive connection upgrades. For more info see documentation.

Operators

  • contains:. sub-string matching. You can now expand your scope to include sub-strings. e.g. “contains: bam” grabs Tweets that include “Obama”.
  • has:mentions. You can now narrow your scope to include only Tweets that include mentions of other Twitter accounts.
  • has:hashtags. You can now narrow your scope to include only Tweets that include hashtags.
  • has:links. Ensure the Tweets you’re looking for have links in them.
  • has:geo. Ensure the Tweets you’re looking for are geo-coded. We’re soon going to enrich all Tweets that aren’t natively geocoded with geocoding (when possible based on content extrapolation).
  • For more info checkout the documentation.

Feel free to reach out to us at info@gnip.com or our Gnip Google Group.

Happy filtering.