Enhanced Filtering for Power Track

Gnip is always looking for ways to improve its filtering capabilities and customer feedback plays a huge role in these efforts.  We are excited to announce enhancements to our PowerTrack product that allow for more precise filtering of the Twitter Firehose, a feature enhancement request that came directly from you, our customers.

Gnip PowerTrack rules now support OR and Grouping using ().  We have also loosened limitations on the number of characters and the number of clauses per rule. Specifically, a single rule can now include up to 10 positive clauses and up to 50 negative clauses (previously 10 total clauses).  Additionally, the character limit per rule has grown from 255 characters to 1024.

With these changes, we are now able to offer our customers a much more robust and precise filtering language to ensure you receive the Tweets that matter most to you and your business.  However, these improvements bring their own set of specific constraints that are important to be aware of.  Examples and details on these limitations are as follows:

OR and Grouping Examples

  • apple OR microsoft
  • apple (iphone OR ipad)
  • apple computer –(fruit OR green)
  • (apple OR mac) (computer OR monitor) new –fruit
  • (apple OR android) (ipad OR tablet) –(fruit green microsoft)

Character Limitations

  • A single rule may contain up to 1024 characters including operators and spaces.

Limitations

  • A single rule must contain at least 1 positive clause
  • A single rule supports a max of 10 positive clauses throughout the rule
  • A single rule supports max of 50 negative clauses throughout the rule
  • Negated ORs are not allowed. The following are examples of invalid rules:
  • -iphone OR ipad
  • ipad OR -(iphone OR ipod)

Precedence

  • An implied “AND” takes precedence in rule evaluation over an OR

For example a rule of:

  • android OR iphone ipad  would be evaluated as apple OR (iphone ipad)
  • ipad iphone OR android would be evaluated as (iphone ipad) OR android

You can find full details of the Gnip Power Track filtering changes in our online documentation.

Know of another way we can improve our filtering to meet your needs?  Let us know in the comments below.

Guide to the Twitter API – Part 3 of 3: An Overview of Twitter’s Streaming API

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Gnip Client Libraries from Our Customers

Our customers rock. When they develop code to start using Gnip, they often share their libraries with us so that they might be useful to future Gnip customers as well. Although Gnip doesn’t currently officially support any client libraries for access to our social media API, we do like to highlight and bring attention to some of our customers who choose to share their work.

In particular, here are a few Gnip client libraries that happy customers have developed and shared with us. We’ll be posting them in our Power Track documentation and you can also find them linked here:

Java
by Zauber
https://github.com/zaubersoftware/gnip4j

Python
by General Sentiment
https://github.com/vkris/gnip-python/blob/master/streamingClient.py

If you’ve developed a library for access to Gnip data and you’d like to share it with us at Gnip and other Gnip customers, then drop us a note at info@gnip.com. We’d love to hear from you.

Announcing Multiple Connections for Premium Twitter Feeds

A frequent request from our customers has been the ability to open multiple connections to Premium Twitter Feeds on their Gnip data collectors. Our customers have asked and we have delivered!

While multiple connections to standard data feeds have been available for quite some time, we have only allowed one connection to our Premium Twitter Feeds.  Beginning today you will be able to open multiple mirrored connections to Power Track, Decahose, Halfhose, and all of our other Premium Twitter Feeds.  This feature will be helpful when testing connections to your Gnip data collector in different environments (such as staging or review) without having an impact on your production connection.

You may be saying “Sounds great Gnip, but will I be charged the standard Twitter licensing fee for the same tweet delivered across multiple connections?”. The answer is no!  You will pay a small flat fee per month for each additional connection.  If you’re interested in adding Multiple Connections to your Premium Twitter Feed please Contact Us.

Links & The Twitter Firehose

One of the more interesting components of Twitter streams are the links within the Tweets themselves. Not only are links one way to bridge from traditional web trend analysis, to social media, but they are also a window into what people are sharing.

Gnip provides three mechanisms to get at links in Tweets.

  • Link Stream. The link stream provides you with 100% of the Tweets that contain links. Furthermore, Gnip enriches the stream with unwound URLs, so you don’t have to bother with an unwind-farm on your end.
  • Power Track’s ‘has:links’ operator. Through Power Track, you can refine your complex queries (including substring matching) to collect only Tweets that contain links.
  • Power Track’s ‘url_contains:’ operator. The ‘url_contains:’ operator allows you to filter the 100% Firehose for Tweets that have links and contain the substring you provide. It filters against both short, and long, URLs.

Happy filtering!

Geo-coded Tweet Streams

Monday’s deploy brought the ‘has:geo’ operator to Gnip’s Twitter Power Track. ‘has:geo’ gives you access to geo-coded Tweets (any Tweet with lat/lng coordinates). Geo-coded Tweets have been one of the most demanded streams/substreams to-date. We’re really excited to bring this feature to light.

Some usage examples:

  • “has:geo”- alone, gives you the complete stream of all geo-coded Tweets
  • “coffee has:geo” – gives you the complete stream of all geo-coded Tweets that contain the word “coffee”
  • “fire has:geo” – gives you the complete stream of all geo-coded Tweets that contain the word “fire”

For a complete listing of Power Track operators see the documentation. As with all Commercial Twitter data products brought to you by Gnip, they are only for use in non-public-display and non-programmatic resyndication use cases. If you want to do at-scale, full-coverage analysis of Twitter streams, we’re here to help. Contact us at info@gnip.com for more info.

New Power Track Features

Gnip’s Twitter Power Track feed has been a raging success! One of the fun things about Power Track is its expandability. We’ve been adding features left and right over the past few weeks to ensure you’re getting the Tweet filtering precision you need, across the 100% Twitter Firehose, with no volume limits.

As of today’s deploy, we’ve added support for the following new features:

General

stream compression (optionally set via typical “Accept-Encoding: gzip” client header). At volume, bandwidth costs are very real, not to mention the operational challenges of dealing with fat pipes. By enabling compression you can dramatically reduce your bandwidth costs and potentially avoid expensive connection upgrades. For more info see documentation.

Operators

  • contains:. sub-string matching. You can now expand your scope to include sub-strings. e.g. “contains: bam” grabs Tweets that include “Obama”.
  • has:mentions. You can now narrow your scope to include only Tweets that include mentions of other Twitter accounts.
  • has:hashtags. You can now narrow your scope to include only Tweets that include hashtags.
  • has:links. Ensure the Tweets you’re looking for have links in them.
  • has:geo. Ensure the Tweets you’re looking for are geo-coded. We’re soon going to enrich all Tweets that aren’t natively geocoded with geocoding (when possible based on content extrapolation).
  • For more info checkout the documentation.

Feel free to reach out to us at info@gnip.com or our Gnip Google Group.

Happy filtering.