Author: Adam Tornes, Product Manager

Adam Tornes is a Product Manager at Twitter where he focuses on Search and Historical PowerTrack for Twitter data.

Historical PowerTrack Requests, Now Faster Than Ever

The Twitter Data Product Team is excited to share an update with you around recent enhancements to our Historical PowerTrack offering. In an effort to improve our customer experience for historical data requests, we’ve made substantial technology investments to reduce processing times as well as to support future adoption and usage patterns.

Historical data jobs have always been processed as fast as our infrastructure allowed, and now they are significantly faster than ever before. You may be asking yourself, “So just what does this mean for my business?” Well, here are some data points that should help to put these improvements in perspective:

  • A 1-year historical data job that previously took in the neighborhood of 144 hours to complete was recently processed in just 5 hours.
  • A 2-year historical data job that previously took nearly 288 hours to complete was recently processed in just 8 hours.

We no longer recommend breaking historical jobs down into smaller pieces to process the data faster, thereby affording our customers one more level of improved efficiency.  Your historical jobs will now actually process faster if they remain intact as a singular request and this should make the job management process easier for you and your team.

One final area of improvement benefiting your business is our more accurate estimates around job processing times. While these predictions will certainly remain just that, “estimates”, the deviation from the eventual processing times will be greatly reduced in most cases.

If you have any further questions around these improvements, feel free to reach out to our Product Team at

The Gnip Usage API: A New Tool for Monitoring Data Consumption

At Gnip we know that reliable, sustainable, and complete data delivery products are core to enabling our customers to surface insights and value from social data. Today we’re excited to announce a new API that will make it easier for our customers to monitor and manage the volume of Gnip data they consume – the Usage API.

The Usage API is a free service that allows Gnip customers to send requests for account consumption statistics across our various data sources. This API allows for even more granular visibility into usage and allows for new automated monitoring and alerting apps. Customers now have a programmatic way for understanding usage trends in addition to the monthly and daily usage reports already available in the Gnip Console.

Customer usage data is shown in aggregate and broken down by data source and product type to provide a narrow lens for studying consumption levels. There are now numerous activity update intervals throughout the day and, where applicable, monthly usage projections are also provided for insight into end-of-month usage statistics. The Usage API also includes consumption thresholds for each account to enable customers to keep track of maximum anticipated consumption levels.

The Usage API is available for use today. To learn more about the Usage API or to find instructions for getting started, please reference our support documentation.

Data Monitoring and Volume Controls

The Gnip Sales and Support teams do an excellent job of helping new customers to determine the best filters for their unique business applications. Whether through live training or our extensive online support portal, we help customers create and manage source-specific rules that result in predictable volumes of data. Customers are free to set as many or as few rules as they like, but sometimes a surprise event or a lapse in planning can result in charges in excess of their data plans.

We recently developed a set of user-configurable controls that will help you to avoid these overages and prevent any surprises come billing date. They allow you to better monitor realtime data usage, set event notifications and even establish hard volume caps on your total data consumption. These new tools can all be found in your Gnip Console.

New Reports

We now offer three different data consumption reports that show your Monthly Usage, Per-Day Usage and Current Day Usage.

  1. Monthly Report”: This report includes information such as Month-to-Date consumption, usage for the previous two months and an estimation of your End-of-Month total based on your current consumption velocity. This is a great tool for looking ahead at your anticipated total usage (and expected charges) at any point during the month. The values are broken out by product, along with a summary of total data consumption from each Source. It is important to note that we dedupe the activities on this report so it can be used to accurately estimate your data charges.

  2. “Daily Report”: This report details your usage on a day-by-day basis for the given month. The values are deduped and are again broken out by product-specific consumption rates.

  3. “Today Report”: This report includes your current day activity as well as those for the previous two days.

Event Notifications

There is a new email alert system available in the Gnip Console that allows you to create admin notifications when certain volume thresholds are reached. You can set both “Warning” and “Critical”-level email alerts, and you can designate as many recipients in your company as you’d like.

Data Capping

The last of these tools is the ability to set hard caps on your data consumption. While the idea of creating a hard-stop on data usage seems pretty basic, keep in mind that a cap will truly cut off your stream if a designated limit is met. Most of our customers prefer to pay for additional data rather than risk a less than positive customer experience should the stream be interrupted. That said, we ask that you contact your Gnip Account Executive to create a data cap.

We think that you will find these new tools to be a great way to more proactively monitor and control your data consumption. Watch our video tutorial for more details, and don’t hesitate to contact us with any additional questions.

Access to Public APIs from Instagram, bitly, Reddit, Stack Overflow, Panaramio and Plurk

Our customers care about every public conversation that happens online. Every month we deliver more than 100 billion social data activities to our clients. While much of our social data is from our premium publishers (Twitter, Tumblr, WordPress, Disqus and StockTwits), we also make a wide range of social data from public APIs readily available through our Enterprise Data Collector product. A significant part of what Gnip does is make social data easier to digest by optimizing the polling of these APIs and by enriching and normalizing the data. We also normalize the data, so if you’re digesting social data from Gnip from the public API of Instagram, it will appear in the same normalized format as social data from Twitter.

To that end, we’re announcing the addition of the public APIs for Instagram, bitly, Reddit, Stack Overflow, Panaramio and Plurk to the Gnip Enterprise Data Collector. While some of those might make perfect sense to you, others might make you turn your head and say, “huh.” Below we have more background on each publisher and why they’re important to the social data ecosystem.

Instagram on Enterprise Data Collector

This photo sharing app, recently acquired by Facebook, continues to be one of the fastest growing social networks out there with 90 monthly million active users. Every day there are 40 million photos uploaded, and every second users like 8,500 photos and make 1,000 comments about them. Our customers have traditionally been very interested in geotagged social data, and between 15 to 25 percent of Instagram users geotag their photographs.

Instagram has become a popular marketing tool for brands from Anthropologie, Intel, Virgin America, Taco Bell and American Express to name a few with Instagram accounts. Furthermore, we’ve really started to see Instagram as a popular tool around current events and for citizen reporting. During Hurricane Sandy, many people used Instagram as a way to document what was happening around them and showing destruction in real time. With the recent inauguration, CNN asked users to tag their Inaugural Instagram photos with #CNN and they saw users submitting an average of 25 photos every few seconds.

Customers accessing the Enterprise Data Collector will be able to access popular posts, conduct tag searches and geosearches.

Potential Uses for the Instagram API:

  • Tracking photos around natural disasters
  • Geo use cases for a given location
  • Brand monitoring

bitly on Enterprise Data Collector

bitly is the easiest and most fun way to save, share and discover links from around the web. While commonly associated as a link shortener for Twitter, bitly is used across the web and provides great information about what social sites are driving traffic. People use bitly to share 80 million new links a day.

Gnip customers will be able to search keywords some of destination page title and URL and some of the content and header tags.

Potential Uses for the bitly API:

  • Monitoring for brand mentions
  • Understanding trending content

Reddit on Enterprise Data Collector 

Reddit is a social news site with user-generated content covering nearly every topic in the world. One of the world’s fastest growing sites in the world, Reddit has 50 million active users contributing links, stories, pictures and topics of discussion.

Customers will be able to search by keyword and hot topics. Brands are often unaware of stories percolating about them on the popular site. One recent interesting example is where a Redditor posted an Applebee’s receipt where a pastor refused to tip her waitress based on how much she was tithing, which ultimately ended up being a national news story.

Potential Uses for the Reddit API:

  • Monitoring for brand mentions
  • Crisis communications warning

Stack Overflow on Enterprise Data Collector

Stack Overflow is a community edited Q&A site about computer programming, making it easy for programmers to find answers to questions they have about code. The site has more than 1.5 million registered users and 4 million questions.

Customers will have access to the entire firehose of Stack Overflow Answers and be able to search tags, reputation and comments by keyword. Programmers tag their questions and making it easy to find the content you’re looking for. Currently, the six most popular tags are C#, Java, PHP, JavaScript, jQuery, and Android.

Potential Uses for the Stack Overflow API:

  • Monitoring questions and discussion about software and technical brands
  • Monitoring bugs and outages
  • Often requested in conjunction with review sites

Panoramio on Enterprise Data Collector

Panoramio is a photo-sharing website with geotagged content that is layered upon Google Earth and Google Maps. Panoramio allows viewers to see an enhanced view of Google Earth because they can see other photos taken in the area.

Customers will be able to use a bounding box to view photos within a certain location. We have consistently found that our customers are eager for more social data with geotagged content.

Potential Uses for the Panoramio API:

  • Monitor social activity within a certain geographic area

Plurk on Enterprise Data Collector

Plurk is a microblogging site that allows users to communicate in posts with 210 characters and emoticons. Plurk has more than 1 million active users that post 3 million “Plurks” each day. Plurk is one of the more popular social networks in Taiwan and also has a strong presence in Hong Kong, Singapore, Philippines and India. Gnip customers will be able to search for keywords within posts.

Potential Uses for the Plurk API:

  • Monitoring for brand mentions, with a particular focus on certain Asian countries
  • Understanding trending content

If you’re interested in learning more about these additional sources on Enterprise Data Collector, please contact for more information.

4 Things You Need To Know About Migrating to Version 1.1 of the Twitter API

Access to Twitter data through their API has been evolving since its inception. Last September, Twitter announced their most recent changes which will take effect this coming March 5. These changes make enhancements to feed delivery, while further limiting the amount of Tweets you can get from the public Twitter API.

The old API was version 1.0 and the new one is version 1.1. If your business or app relies on Twitter’s public API, you may be asking yourself “What’s new in Twitter API 1.1?” or “What changed in Twitter API 1.1?” While there’s not much new, a lot has changed and there are several steps you need to take to ensure that you’re still able to access Twitter data after March 5th.

1. OAuth Connection Required
In Twitter API 1.1, access to the API requires authentication using OAuth. To get your Twitter OAuth token, you’ll need to fill out this form.  Note that rate limits will be applied on a per-endpoint, per-OAuth token basis and distributing your requests among multiple IP addresses will not work anymore as a workaround. Requests to the API without OAuth authorization will not return data and will receive a HTTP 410 Gone response.

2. 80% Less Data
In version 1.0, the rate limit on the Twitter Search API was 1 request per second. In Twitter API 1.1, that changes to 1 request per every 5 seconds. A more stark way to put this is that previously you could make 3600 requests/hour but you are now limited to 720 requests/hour for Twitter data. Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits.

3. New Endpoint URLs
Twitter API 1.1 also has new endpoint URLs that you will need to direct your application to in order to access the data. If you try to access the old endpoints, you won’t receive any data and will receive a HTTP 410 Gone response.

4. Hello JSON. Goodbye XML.
Twitter has changed the format in which the data is delivered. In version 1.0 of the Twitter API, data was delivered in XML format. Twitter API 1.1 delivers data in JSON format only. Twitter has been slowly transitioning away from XML starting with the Streaming API and Trend API.  Going forward, all APIs will be using JSON and not XML. The Twitter JSON API is a great step forward as JSON has a much wider standardization than XML does.

All in all, some pretty impactful changes.  If you’re looking for more information, we’ve provided some links below with more details.  If you’re interested in getting full coverage commercial grade access to Twitter data where rate limits are a thing of the past, check out the details of Gnip’s Twitter offerings.  We have a variety of Twitter products, including realtime coverage and volume streams, as well as access to the entire archive of historical Tweets.

Update: Twitter has recently announced that the Twitter REST API v1.0 will officially retire on May 7, 2013. Between now and then they will continue to run blackout tests and those who have not migrated will see interrupted coverage so migrating as soon as possible is highly encouraged.

Helpful Links
Version 1.0 Retirement Post
Version 1.0 Retirement Final Dates
Changes coming in Twitter API 1.1
OAuth Application Form
REST API Version 1.1 Resources
Twitter API 1.1 FAQ
Twitter API 1.1 Discussion
Twitter Error Code Responses

New Twitter Filtering Options

You asked and we delivered! Based on our customers’ feedback, we’ve introduced a number of new operators to the Twitter PowerTrack stream. With these new operators you can filter more precisely on geo data and the contents of a user’s Twitter profile. Check out the details below and let us know if you have any questions.

Many tweets with geo data are tagged with a “place” and these “places” are often associated with a country code indicating where that place is located. Using our new country_code operator, you can now filter all Tweets that have a specific country code. This can be done using Alpha-2 ISO codes to create a rule operator like: country_code:gb for all Tweets that have a “place” in Great Britain.

As mentioned in the country_code description, many Tweets with geo data are tagged with a “place”. This place is a semi-normalized location determined by the Twitter app.  Examples might include “Boulder, CO” or “Jimmy’s Pizza”. Because this text is at best semi-normalized, we have created a place_contains operator as a compliment to our “place” operator, that performs a substring match. For example, using a place_contains:”Boulder” operator would match a tweet with a place of “Boulder” and a tweet with a place of “Boulder, CO”, whereas the place:”Boulder” operator would have only matched the former.

In a user’s Twitter profile, they have the ability to specify a location. This field is completely freeform text and the locations are not normalized at all. In order to allow customers to get Tweets from user’s whose location is both Boulder and one whose location is “Boulder, CO”, we’ve introduced a bio_location operator that performs a tokenized keyword match on the contents of the field.

In the same vein as the bio_location operator, the bio_location_contains operator offers the ability to filter the Twitter stream based on the location users have specified in their Twitter bio. However, the bio_location_contains operator that performs a substring match on this field.

Gnip’s new time_zone operator allows customers to filter the Twitter firehose for Tweets that have a time_zone that exactly matches that provided in the rule. The values delivered in the time zone field of the payload are normalized, based on Twitter’s description in account settings. Note that it is a string-based filter and needs to be an exact match.  

Within a Twitter user’s profile, they are required to select a language. This language setting simply changes the language which Twitter displays its UI text (it does not translate Tweet text).  THIS IS NOT A LANGUAGE CLASSIFICATION. Customers have reported that this setting is often left in its default of English even when the Tweets an account is generating are in a foreign language. We recommend its use in conjunction with Gnip’s language classification operator (lang) rather than a standalone indicator of a user or Tweet’s language.

Like klout score, a user’s followers count can be used as a proxy for influence, and we have created a similar operator to allow our customers to filter on this data. Like the klout_score operator, the followers_count operator allows filtering the Twitter firehose to include Tweets from users with a follower count in a range or greater than a given value. WARNING: Use this operator with caution as it can easily result in the unexpected delivery of very high volumes of data.

The bio_name_contains operator allows customers to filter for only Tweets that contain a given substring in a Twitter user’s displayName entered in their profile.

The has:media operator allows for the filtering of all Tweets that contain a media URL in the Tweet body, be it an image, video, or otherwise. Has:media is true based on the inclusion of the media entity in the Tweet payload delivered by Twitter. No media detection or extraction is performed by Gnip in support of this operator.

You can find additional documentation on each of these new operators and the rest of our Twitter filtering options at

As always, we love to hear feedback and suggestions from our customers, and much of our product roadmap and prioritization is driven by customer needs and requests.  Keep the feedback coming via your account reps and

Get the Disqus Firehose With New Filtering Options

In February, we announced that the full Disqus firehose of public comments is available through Gnip. Our customers love the conversations in Disqus, but have asked for tools to filter the stream so they receive only the conversations they want. Today, we’re announcing our new Disqus PowerTrack offering. Similar to our Twitter PowerTrack product, Disqus PowerTrack offers powerful filtering so customers can filter the full Disqus firehose of public comments to extract the specific conversations they’re looking for. With over 500,000 comments created each day on Disqus, there are a huge range of conversations taking place and you don’t want to miss the ones about your brand or products.

With Disqus PowerTrack, you have a wide array of filtering options. You can filter for specific keywords. You can constrain that filter to specific websites. Or you can look for just the mentions that have links. So, if you’re looking for brand mentions of Apple, you can track conversations about the iPhone or brand mentions in general. You can also monitor for comments mentioning the iPhone that have links in them so you can understand what online stores are being promoted along with your products. See the full list of Disqus PowerTrack Operators in our documentation.

To see the power of the full Disqus firehose, check out this graph showing all mentions of Apple on Disqus. On a normal weekday, there are almost 10,000 comments about Apple. For big events, like WWDC, you see a spike to almost 40,000 comments per day. That’s a lot of conversations.

Get Disqus Firehose with New Filtering Options

We’re big proponents of the conversations that happen in comments, and we’re committed to making it easier for companies to understand and be able to participate. Our new Disqus PowerTrack makes it easier than ever to understand the types of conversations happening in comments.

If you have any questions about the new Disqus capabilities, please contact your sales rep or our sales team at

Filtering for Tweets by User Bio

One of the requests we often hear from customers is that they’d like to be able to filter for Tweets from users who match a specific demographic.  I’m excited to announce the addition of a new operator to our PowerTrack suite that enables you to do exactly that.

The bio_contains operator enables you to filter for Tweets from users whose freeform Twitter bio contains a specific keyword, phrase or string.  The operator does a substring match against the user bio, much like our url_contains operator matches against the contents of the URL string.  To use the bio_contains operator, simply add a bio_contains:keyword clause to any rule.

Use Cases
One great use for this operator is to filter for Tweets based on target demographic.  For example, say you’re analyzing social media for Tide laundry detergent and want to see what moms are saying about the brand following a major marketing campaign.  Using the bio_contains operator, you could create a rule to receive Tweets from Twitter users who explicitly state in their bio that they are a mom and mentioned Tide in their Tweet.

User’s Bio: “Loving Mom, Wife and Daughter”
Tweet: “I love the new Tide!”
Rule: Tide bio_contains:mom

Another use would be to see all Tweets from a competitor’s employees in hopes of gaining some competitive intelligence.  In this use case, I might want to receive ALL tweets from users whose bio mentions ABC Corp.

User’s Bio: “Product Manager at ABC Corp”
Rule: bio_contains:”ABC Corp”

These are only a few of the possible use cases and we’re sure our customers have many others that would put these to shame.  We’d love to hear about them!

Important Details
The operator does have some intricacies that it is important to be aware of.

  • Unless the bio_contains operator is combined with additional clauses and operators in a rule, the bio_contains operator will match EVERY tweet from a user whose bio contains the keyword or phrase.  Depending on the keyword or phrase, this could result in receiving A LOT of Tweets.
  • All keywords or phrases containing spaces or punctuation should be surrounded by quotes.
  • The operator performs a substring match against a user’s bio and ignores word boundaries.  As a result, if your keyword or phrase is part of another word or phrase, it will be considered a match.  For example, a keyword of “pants” would match a bio containing a term like “#TeamSpongeBobSquarePants”.  Should this be an issue, we would recommend one of two solutions:
  1. Add a negation to exclude the matches you don’t want
    i.e. bio_contains:pants -bio_contains:”#TeamSpongeBobSquarePants”
  2. Quote common word boundaries in conjunction with the OR operator
    i.e. bio_contains:” pants ”  OR bio_contains:”pants/” OR bio_contains:” pants.”

As with most of our work, this new operator started with customer requests.  Thanks for the product feedback and keep it coming.  Additional documentation of this new operator and others can be found in our online documentation. If you’re interested in learning more about how to filter Twitter by bio, please contact

For the Times When Every Tweet is Too Many

Our customers tell us that getting every single Tweet that matters is one of the key reasons they work with Gnip. And sometimes getting every Tweet that matters means filtering out the Tweets you don’t want. With this in mind, I’m happy to announce the introduction of two new operators to our Power Track filtering suite.

Retweet Operator

The Retweet operator allows a customer to ensure only Retweets that match a rule are delivered or excluded.

To use the Retweet operator, simply add is:retweet or –is:retweet to any rule.

Examples Include:

  • Receive only Retweets mentioning Apple using a rule like: apple is:retweet as a way to measure engagement of the brand’s fan base


  • Get only Tweets with unique content about Apple using a rule like: apple -is:retweet to monitor conversation about the brand and ignore the tremendous volume of retweets generated by the brand

Sampling Operator

The Sampling operator allows a customer to receive a random sample of Tweets that match a rule rather than the entire set of Tweets.

There are several use cases where the Sampling operator is useful.  Say you want to stay within a budgeted number of Tweets each month, but you’re trending higher than that budget halfway through the month.  With the Sampling operator, you can scale back your consumption without fully eliminating rules.  In another use case you might want to monitor a very high-volume rule or user, but your internal systems can’t handle this volume.  Sampling makes this more manageable.  Finally, there are times when you simply need to know the directional volumes for things, and don’t need every tweet.

To use the Sampling operator, add sample:## to any rule with an integer value between 1 to 100. The Sampling operator applies to the entire rule and requires any “OR’d” terms be grouped.

Examples Include:

  • Receive a sampling of 10% of all Tweets that contain “apple” using a rule like:

apple sample:10


  • Receive a sampling of 50% of all Tweets that contain “iPad” or “iPhone” using a rule like:

(ipad OR iphone) sample:50

As always, thank you for the product feedback and keep it coming.  Additional documentation of these new operators and others can be found in our online documentation.