Copyright © 2010 Gnip, inc.
Gnip makes it easy to build social media tracking tools.
888-777-7405
Gnip is always looking for ways to improve its filtering capabilities and customer feedback plays a huge role in these efforts. We are excited to announce enhancements to our PowerTrack product that allow for more precise filtering of the Twitter Firehose, a feature enhancement request that came directly from you, our customers.
Gnip PowerTrack rules now support OR and Grouping using (). We have also loosened limitations on the number of characters and the number of clauses per rule. Specifically, a single rule can now include up to 10 positive clauses and up to 50 negative clauses (previously 10 total clauses). Additionally, the character limit per rule has grown from 255 characters to 1024.
With these changes, we are now able to offer our customers a much more robust and precise filtering language to ensure you receive the Tweets that matter most to you and your business. However, these improvements bring their own set of specific constraints that are important to be aware of. Examples and details on these limitations are as follows:
OR and Grouping Examples
Character Limitations
Limitations
Precedence
For example a rule of:
You can find full details of the Gnip Power Track filtering changes in our online documentation.
Know of another way we can improve our filtering to meet your needs? Let us know in the comments below.
While the market has been on its roller coaster ride across the past month, Gnip has kept its collective head down and stayed busy on behalf of our Investment Management clients (hedge funds, HFTs, asset managers, etc.). That hard work has paid off and we have two exciting announcements to make today.
While the use of social media data by the investment community has included use of this data in news analysis and equity research, the primary adoption of this data across the last six months has been as a trading indicator. By combining the strengths of both the Twitter stream and the StockTwits stream, Gnip MarketStream provides investment professionals unparalleled access to relevant social data at time when social media has become an increasingly vital channel for news and market sentiment.
For more information about Gnip MarketStream or StockTwits data, contact trading@gnip.com.

Gnip is excited to announce the addition of Google+ to its repertoire of social media data sources. Built on top of the Google+ Search API, Gnip’s stream allows its customers to consume realtime social media data from Google’s fast-growing social networking service. Using Gnip’s stream, customers can poll Google+ for public posts and comments matching the terms and phrases relevant to their business and client needs.
Google+ is an emerging player in the social networking space that is a great pairing with the Twitter, Facebook, and other microblog content currently offered by Gnip. If you are looking for volume, Google+ quickly became the third largest social networking platform within a week of its public launch and some are projecting it to emerge as the world’s second largest social network within the next twelve months. Looking to consume content from social network influencers? Google+ is where they are! (even former Facebook President Sean Parker says so).
By working with Gnip along with a stream of Google+ data (and the availability of an abundance of other social data sources), you’ll have access to a normalized data format, unwound URLs, and data deduplication. Existing Gnip customers can seamlessly add Google+ to their Gnip Data Collectors (all you need is a Google API Key). New to Gnip? Let us help you design the right solution for your social data needs, contact sales@gnip.com.
One of the reasons that we at love working in the Boulder Valley is because of the incredible and talented companies that make up the local business ecosystem. Given the depth and quality of innovative organizations that make Boulder their home, we’re extremely excited and very honored to announce today that we’ve won the Boulder County Business Report (BCBR) Innovative Quotient (IQ) Award for Social Media/Mobile Applications.
Presented by the BCBR, the IQ Awards is an annual event that honors the most innovative new products and services developed by companies and organizations, with a special emphasis on advanced technologies, innovations within a particular business sector and sustainable business practices.
Congratulations to all of last nights winners, with a big shout out to our fellow Foundry family member Standing Cloud who won the award in the Internet/Web category. Below is a list of companies that were recognized and their respective categories:
Thank you to the Boulder County Business Report for recognizing the amazing innovation that exists in our community and congrats again to all of our fellow winners! Keep the innovation flowing, Boulder.
For more info, check out our press release.

Providing Klout Scores, a measurement of a user’s overall online influence, for every individual in the exponentially ever-growing base of Twitter users was the task at hand for Matthew Thomson, VP of Platform at Klout. With massive amounts of data flowing in by the second, Thomson and Klout’s scientists and engineers needed a fast and reliable solution for processing, filtering, and eliminating data from the Twitter Firehose that was unnecessary for calculating and assigning Twitter users’ Klout Scores
- Matthew Thomson
VP of Platform, Klout
By selecting Gnip as their trusted premium Twitter data delivery partner, Klout tripled their API volume and increased their ability to provide influence scores by 50 percent among Twitter users in less than one month.
Get the full detail, read the success story here.
Like many startups seeking to enter and capitalize on the rising social media marketplace, timing is everything. MutualMind was no exception: getting their enterprise social media management product to market in a timely manner was crucial to the success of their business. MutualMind provides an enterprise social media intelligence and management system that monitors, analyzes, and promotes brands on social networks and helps increase social media ROI. The platform enables customers to listen to discussion on the social web, gauge sentiment, track competitors, identify and engage with influencers, and use resulting insights to improve their overall brand strategy.
- Babar Bhatti
CEO, MutualMind
By selecting Gnip as their data delivery partner, MutualMind was able to get their product to market six months ahead of schedule. Today, MutualMind processes tens of millions of data activities per month using multiple sources from Gnip including premium Twitter data, YouTube, Flickr, and more.
Get the full detail, read the success story here.
The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)
The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.
Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.
–
The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.
But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.
If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.
Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases. To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.
So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream. And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.
Let’s consider a couple examples to clarify. First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.
Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.
Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.
Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).
So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)
But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)
Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…
Understanding Twitter’s Public APIs . . . You Mean There is More than One?
In fact, there are three Twitter APIs: the REST API, the Streaming API, and the Search API. Within the world of social media monitoring and social media analytics, we need to focus primarily on the latter two.
Whether you get your Twitter data from the Search API, the Streaming API, or through Gnip, only public statuses are available (and NOT protected Tweets). Additionally, before Tweets are made available to both of these APIs and Gnip, Twitter applies a quality filter to weed out spam.
So now that you have a general understanding of Twitter’s APIs . . . stay tuned for Part 2, where we will take a deeper dive into understanding Twitter’s Search API, coming next week…
Gnip and Automattic Make Whole New Universe of Data Available
January 17th, 2012-
Tags: automattic, blogs, comments, deep analysis, engagement, firehose, gnip, intensedebate, jetpack, likes, wordpress.com, wordpress.org
No CommentsPosted by Bill Adkins, Director of Business Development in Data, Partners, Product
“This new data from Automattic is a big addition and a testament to Gnip’s commitment to drive the social data economy forward. This is an important source to add to the social data mix, one that we know our customers will take full advantage of.”
- Rob Begg, VP Marketing of Radian6
Today, we’re excited to announce a major addition to our coverage of the conversations taking place on blogs around the world. We’re expanding our relationship with Automattic to make a whole new universe of blog and comment data available to the market for the first time anywhere.
For those who don’t know, Automattic is a network of web services including WordPress.com, VIP hosting and support, Polldaddy, IntenseDebate, and Jetpack. We’ve been delivering data from WordPress.com and IntenseDebate for about a year and a half and found that while our customers loved their data, they always wanted more.
As of today, we are now offering the full firehose of blog posts and comments from Jetpack-powered WordPress.org sites, as well as engagement streams of “likes” from WordPress.com and IntenseDebate. The new data from WordPress.org greatly increases the coverage available to those who are looking to do deep analysis of blog posts and comments. The new engagement streams enable companies to pull in reaction data to quickly understand sentiment, relevance and resonance. With this they can gauge the intensity of opinion around fast moving blog and comment conversations, helping prioritize critical response.
Being full firehoses, all of the streams from Automattic ensure 100% coverage in realtime giving customers the peace of mind that they can keep up the entire discussion on fast moving threads.
The scope of coverage offered by Automattic is pretty incredible. Check out some of these stats:
We’re thrilled to be able to offer these new data streams to our customers and can’t wait to see the amazing things they’ll be able to do with them.
Updated: Coverage in GigaOM – Gnip and WordPress deepen ties, expand data partnership