4 Things You Need To Know About Migrating to Version 1.1 of the Twitter API

Access to Twitter data through their API has been evolving since its inception. Last September, Twitter announced their most recent changes which will take effect this coming March 5. These changes make enhancements to feed delivery, while further limiting the amount of Tweets you can get from the public Twitter API.

The old API was version 1.0 and the new one is version 1.1. If your business or app relies on Twitter’s public API, you may be asking yourself “What’s new in Twitter API 1.1?” or “What changed in Twitter API 1.1?” While there’s not much new, a lot has changed and there are several steps you need to take to ensure that you’re still able to access Twitter data after March 5th.

1. OAuth Connection Required
In Twitter API 1.1, access to the API requires authentication using OAuth. To get your Twitter OAuth token, you’ll need to fill out this form.  Note that rate limits will be applied on a per-endpoint, per-OAuth token basis and distributing your requests among multiple IP addresses will not work anymore as a workaround. Requests to the API without OAuth authorization will not return data and will receive a HTTP 410 Gone response.

2. 80% Less Data
In version 1.0, the rate limit on the Twitter Search API was 1 request per second. In Twitter API 1.1, that changes to 1 request per every 5 seconds. A more stark way to put this is that previously you could make 3600 requests/hour but you are now limited to 720 requests/hour for Twitter data. Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits.

3. New Endpoint URLs
Twitter API 1.1 also has new endpoint URLs that you will need to direct your application to in order to access the data. If you try to access the old endpoints, you won’t receive any data and will receive a HTTP 410 Gone response.

4. Hello JSON. Goodbye XML.
Twitter has changed the format in which the data is delivered. In version 1.0 of the Twitter API, data was delivered in XML format. Twitter API 1.1 delivers data in JSON format only. Twitter has been slowly transitioning away from XML starting with the Streaming API and Trend API.  Going forward, all APIs will be using JSON and not XML. The Twitter JSON API is a great step forward as JSON has a much wider standardization than XML does.

All in all, some pretty impactful changes.  If you’re looking for more information, we’ve provided some links below with more details.  If you’re interested in getting full coverage commercial grade access to Twitter data where rate limits are a thing of the past, check out the details of Gnip’s Twitter offerings.  We have a variety of Twitter products, including realtime coverage and volume streams, as well as access to the entire archive of historical Tweets.

Update: Twitter has recently announced that the Twitter REST API v1.0 will officially retire on May 7, 2013. Between now and then they will continue to run blackout tests and those who have not migrated will see interrupted coverage so migrating as soon as possible is highly encouraged.

Helpful Links
Version 1.0 Retirement Post
Version 1.0 Retirement Final Dates
Changes coming in Twitter API 1.1
OAuth Application Form
REST API Version 1.1 Resources
Twitter API 1.1 FAQ
Twitter API 1.1 Discussion
Twitter Error Code Responses

Dreamforce Hackathon Winner: Enterprise Mood Monitor

As we wrote in our last post, Gnip co-sponsored the 2011 Dreamforce Hackathon, where teams of developers from all over the world competed for the top three overall cash prizes as well as prizes in multiple categories.  Our very own Rob Johnson (@robjohnson), VP of Product and Strategy, helped judge the entries, selecting the Enterprise Mood Monitor as winner of the Gnip category.

The Enterprise Mood Monitor pulls in data from a variety of social media sources, including the Gnip API, to provide realtime and historical information about the emotional health of the employees. It shows both individual and overall company emotional climate over time and can send SMS messages to a manager in cases when the mood level goes below a threshold. In addition, HR departments can use this data to get insights into employee morale and satisfaction over time, eliminating the need to conduct the standard employee satisfaction surveys. This mood analysis data can also be correlated with business metrics such as Sales and Support KPIs to identify drivers of business performance.

Pretty cool stuff.

The three developers (Shamil Arsunukayev , Ivan Melnikov  and Gaziz Tazhenov) from Comity Designs behind this idea set out to create a cloud app for the social enterprise built on one of Salesforce’s platforms.  They spent two days brainstorming the possibilities before diving into two days of rigorous coding. The result was the Enterprise Mood Monitor, built on the Force.com platform using Apex, Visualforce, and the following technologies: Facebook API (Graph API),  Twitter API, Twitter Sentiment API, LinkedIn API, Gnip API, Twilio, Chatter, Google Visualization API. The team entered their Enterprise Mood Monitor into the Twilio and Gnip categories. We would like to congratulate the guys on their “double-dip” win as they took third place overall and won the Gnip category prize!

Have fun and creative way you’ve used data from Gnip? Drop us an email or give us a call at 888.777.7405 and you could be featured in our next blog.

We're off to Dreamforce!

There’s always a lot going on here at Gnip, but this week is especially packed with the team looking to make a big splash at Salesforce.com’s annual Dreamforce event. Salesforce is obviously a huge player in the software space and the theme of this year’s Dreamforce is “Welcome to the Social Enterprise” which fits really nicely with what we do.

At the conference, we’ll be speaking at two sessions and sponsoring the Hack-a-thon. In the first presentation, Drinking from the Firehose: How Social Data is Changing Business Practices, Jud (@jvaleski) and Chris (@chrismoodycom) will discuss the ways that social data is being used to drive innovation across a variety of industries from Financial Services and Emergency Response to Local Business and Consumer Electronics. They’ll also give a glimpse into the technical challenges involved in handling the ever-increasing volume of data that’s flowing out of Twitter every day. If you’re at Dreamforce, this session is on Tuesday (8/30) from 11am to noon in the DevZone Theater on the 2nd floor of Moscone West.

In the second presentation, Your Guide to Understanding the Twitter API, Rob (@robjohnson) will talk through the best ways to get access to the Twitter data that you’re looking for, examining the pros and cons of the various methods. You can check out Rob’s session on Tuesday (8/30) from 3:00 to 3:30 in the Lightning Forum in the DevZone on the 2nd floor of Moscone West.

And finally, we’re sponsoring the Hack-a-thon where teams of developers will create cloud apps for the social enterprise using Twitter feeds from Gnip and at least one of the Salesforce platforms (Force.com, Heroku, Database.com). The winning team stands to take home at least $10,000 in prize money. We’re really excited to see the creative solutions that the teams develop! All submissions are due no later than 6am on Thursday (9/1), so sign up now and get going!

Want to meet up in person at Dreamforce? Give any of us a shout @jvaleski, @chrismoodycom, @robjohnson, @funkefred.

Guide to the Twitter API – Part 3 of 3: An Overview of Twitter’s Streaming API

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Reminder: Gnip Platform Updates This Friday

This post is meant to provide a reminder and additional guidance for Gnip platform users as we transition to the new Twitter Streaming API at the end of the week.   We have lots going and want to make sure companies and developers are keeping up with the moving parts.

  • Friday, June 19th:  Twitter is turning off the original XMPP firehose that we have used as the default “Twitter Data Publisher” in the Community Edition of the platform.
  • Starting on Friday, June 19th the new default “Twitter Data Publisher” in the Community Edition of the platform will be integrated to the new “spritzer” tier of the Twitter Streaming API.     Spritzer is a sample of the Twitter stream and not a “firehose”.   This is the default publicly available stream that Twitter is allowing Gnip to make available for anyone to integrate.
  • All Gnip users will be able to access full-data filters with the updated Twitter Data Publisher
  • If your company has an authorized Twitter account for the gardenhose, shadow or birddog tiers and do not want to build and maintain this integration contact us by email at info@gnip.com or shane@gnip.com to discuss how Gnip can provide a solution.

Helpful information about the new Twitter Streaming API:

PS:  The planned Facebook integration is coming along and we have our internal prototype completed.  Driving toward the beta and should have more details in the next week or two.

PSS: We would still appreciate any feedback people can provide on their Twitter data intgration needs – take the survey

Gnip: Transitioning to New Twitter Streaming API in June

When we started Gnip last year Twitter was among the first group of companies that understood the data integration problems we were trying to solve for developers and companies.   Because Gnip and Twitter were able to work together it has been possible to access and integrate data from Twitter by using the Gnip platform since last July using Gnip Notifications, and since last September using Gnip Data Activities.

All of this data access was the result of Gnip working with the Twitter XMPP “firehose” API to provide Twitter data access for users of both the Gnip Community and Standard edition product offerings.   Recently Twitter announced a new Streaming API and began an alpha program to start making the new API available.  Gnip has been testing the new Streaming API and now we are planning to move from the current XMPP API to the new Streaming API in the middle of June.    This transition to the new Streaming API will mean some changes in the default behavior and ability to access Twitter data as described below

New Streaming API Transition Highlights

  1. Gnip will now be able to provide both Gnip Notifications and Gnip Data Activities to all users of the Gnip platform.   We had stopped providing access to Data Activities to new customers last November when Twitter began working on the new API, but now all users of the Gnip platform can use either Notifications or Data Activities based on what is appropriate for their application use case.
  2. There are no changes to the Gnip API or service endpoints of Gnip Publishers and Filters due to this transition.  This is changing the default Twitter API that we integrate to for data from Twitter (added about 2 hours after original post)
  3. The Twitter Streaming API is meant to accommodate a class of applications that require near-real-time access to Twitter public statuses and is provided with several tiers of streaming API methods.  See the Twitter documentation for more information.
  4. The default Streaming API tiers that Gnip will be making available are the new “spritzer” and “follow” stream methods.   These are the only tiers which are made available publicly without requiring an end user agreement directly with Twitter at this time.
  5. The “spritzer” stream method is not a “firehose” as the XMPP stream that Gnip previously used as our default.   The average messages per second is still being worked out by Twitter, but at this time “spritzer” runs in the ballpark of 10-20 messages per second and can vary depending on lots of variables being managed by Twitter.
  6. The “follow” stream method returns public statuses from a specified set of users, by ID.
  7. For more on “spritzer”, “follow”, and other methods see the Twitter Streaming API Documentation.

What About Companies and Developers With Use Cases Are Not Met With the Twitter “Spritzer” and “Follow” Streaming API methods


Gnip and Twitter realize that many use cases exist for how companies want to use Twitter data and that new applications are being built everyday.   Therefore we are exploring how companies that are authorized by Twitter for other Streaming API methods  would be able to use the Gnip platform as their integration platform of choice.

 

Twitter has several additional Streaming API methods available to approved parties that require a signed agreement to access.   To better understand which developers and companies using the Gnip platform could benefit from these other Streaming API options we would encourage Gnip platform users to take this short 12 question survey: Gnip: Twitter Data Publisher Survey (URL: http://www.surveymonkey.com/s.aspx?sm=dQEkfMN15NyzWpu9sUgzhw_3d_3d)

What About the Gnip Twitter-search Data Publisher?


The Gnip Twitter-search Data Publisher is not impacted by the transition to the new Twitter Streaming API since it is implemented using the new Gnip Polling Service and provides keyword-based data integration to the search.twitter APIs.

We will provide more information when we lock down the actual day for the transition shortly.    Please take the survey and as always please contact us directly at info@gnip.com or send me a direct email at shane@gnip.com

That Twitter Thing

Oh, crap, Eric’s gone and written another long post…

Since we publicly launched Gnip last week, we’ve been asked numerous times if we can integrate with Twitter or somehow help Twitter with the scaling issues they are facing.  We can, but we depend on Twitter giving us access to their XMPP feed.

We are huge fans of Twitter so we’re patiently waiting for that access.  In the mean time, the questions we’ve received have prompted us to explain two things: (1) How we would benefit Twitter and anyone who wants access to Twitter data and (2) Why – if you are a web service – it’s worth integrating now with Gnip rather than waiting either for (a) Gnip to integrate with Twitter or (b) you to get as popular as Twitter and have scale issues.

Let’s address the first issue: How we would benefit Twitter and anyone that wants to integrate with Twitter data.

Twitter has found that XMPP doesn’t scale for them and as a result, people are forced to poll their API *a lot* to get updates for their users.  MyBlogLog has over 25,000 Twitter users that they throw against the Twitter API every 15 minutes.  This results in nearly 2.5 million queries against the API every day, for maybe 250K updates.  Now add millions of pings from Plaxo and SocialThing and Lijit and heaven forbid Yahoo starts beating up their API…

If Twitter starts pushing updates to us, via our dead simple API or Atom or their XMPP server, we can immediately reduce by an order of magnitude the number of requests that some very large sites are making against their API.  At the same time, we reduce the latency between when someone Tweets and when it shows up on consuming sites like Plaxo.  From 15 minutes or more to 60 seconds or less.

We expect that Twitter has their collective heads down and are working around the clock to buttress their infrastructure, and it’s unlikely that they’re going to do anything optional until that’s sorted out.  Unfortunately, “integrate with Gnip” probably falls into the optional category. We expect, however, that at some point Twitter will start opening up their data to more partners once they feel like they have their arms around their infrastructure.

If you run a web service and integrate with Gnip today, you’ll automatically be able to integrate with Twitter data once they give us access.  Presumably you won’t have to wait in line to get direct Twitter integration.  In addition, you’ll have immediate access to all of the other data providers that we integrate with. Such as  Delicious, Flickr, Magnolia, Get Satisfaction, Intense Debate and Six Apart.  For example, only took Brightkite 15 minutes to integrate our API and start pushing data to our partners via us.

Now for the second topic.  Why – if you are a web service – it’s worth integrating with Gnip now rather than waiting either for (a) Gnip to integrate with Twitter or (b) you to get as popular as Twitter and have scale issues.

All things considered, it’s best not to end up in Twitter’s position.  They have a ton of passionate users (I’m one of them) who want reliable service and don’t have infinite patience.  The old startup cliche of “these are problems we’d like to have” is carp.

You don’t want to be in the position where your business suddenly takes off and your infrastructure falls over because people are banging your APIs to death.  You don’t want your most passionate users calling for mass exodus.  It’s better to take a few minutes to start pushing notifications to Gnip now than when you’re doing 20-hour days rebooting servers.

You also don’t want to be in the position that your company takes off and you suddenly get throttled by an API provider.  Nothing is worse than have to pull data sources because you’ve over-polled and the host decides to turn off the spigot.  Start pulling notifications from Gnip and feel secure that you’re only asking for data when there’s something new.

I still use Twitter every day.  Don’t try to kid me; I know you still do too.  Let them get on with their work and rest assured that we’ll integrate with them the instant we get the okay from them.