New Twitter Filtering Options

You asked and we delivered! Based on our customers’ feedback, we’ve introduced a number of new operators to the Twitter PowerTrack stream. With these new operators you can filter more precisely on geo data and the contents of a user’s Twitter profile. Check out the details below and let us know if you have any questions.

Many tweets with geo data are tagged with a “place” and these “places” are often associated with a country code indicating where that place is located. Using our new country_code operator, you can now filter all Tweets that have a specific country code. This can be done using Alpha-2 ISO codes to create a rule operator like: country_code:gb for all Tweets that have a “place” in Great Britain.

As mentioned in the country_code description, many Tweets with geo data are tagged with a “place”. This place is a semi-normalized location determined by the Twitter app.  Examples might include “Boulder, CO” or “Jimmy’s Pizza”. Because this text is at best semi-normalized, we have created a place_contains operator as a compliment to our “place” operator, that performs a substring match. For example, using a place_contains:”Boulder” operator would match a tweet with a place of “Boulder” and a tweet with a place of “Boulder, CO”, whereas the place:”Boulder” operator would have only matched the former.

In a user’s Twitter profile, they have the ability to specify a location. This field is completely freeform text and the locations are not normalized at all. In order to allow customers to get Tweets from user’s whose location is both Boulder and one whose location is “Boulder, CO”, we’ve introduced a bio_location operator that performs a tokenized keyword match on the contents of the field.

In the same vein as the bio_location operator, the bio_location_contains operator offers the ability to filter the Twitter stream based on the location users have specified in their Twitter bio. However, the bio_location_contains operator that performs a substring match on this field.

Gnip’s new time_zone operator allows customers to filter the Twitter firehose for Tweets that have a time_zone that exactly matches that provided in the rule. The values delivered in the time zone field of the payload are normalized, based on Twitter’s description in account settings. Note that it is a string-based filter and needs to be an exact match.  

Within a Twitter user’s profile, they are required to select a language. This language setting simply changes the language which Twitter displays its UI text (it does not translate Tweet text).  THIS IS NOT A LANGUAGE CLASSIFICATION. Customers have reported that this setting is often left in its default of English even when the Tweets an account is generating are in a foreign language. We recommend its use in conjunction with Gnip’s language classification operator (lang) rather than a standalone indicator of a user or Tweet’s language.

Like klout score, a user’s followers count can be used as a proxy for influence, and we have created a similar operator to allow our customers to filter on this data. Like the klout_score operator, the followers_count operator allows filtering the Twitter firehose to include Tweets from users with a follower count in a range or greater than a given value. WARNING: Use this operator with caution as it can easily result in the unexpected delivery of very high volumes of data.

The bio_name_contains operator allows customers to filter for only Tweets that contain a given substring in a Twitter user’s displayName entered in their profile.

The has:media operator allows for the filtering of all Tweets that contain a media URL in the Tweet body, be it an image, video, or otherwise. Has:media is true based on the inclusion of the media entity in the Tweet payload delivered by Twitter. No media detection or extraction is performed by Gnip in support of this operator.

You can find additional documentation on each of these new operators and the rest of our Twitter filtering options at

As always, we love to hear feedback and suggestions from our customers, and much of our product roadmap and prioritization is driven by customer needs and requests.  Keep the feedback coming via your account reps and