4 Things You Need To Know About Migrating to Version 1.1 of the Twitter API

Access to Twitter data through their API has been evolving since its inception. Last September, Twitter announced their most recent changes which will take effect this coming March 5. These changes make enhancements to feed delivery, while further limiting the amount of Tweets you can get from the public Twitter API.

The old API was version 1.0 and the new one is version 1.1. If your business or app relies on Twitter’s public API, you may be asking yourself “What’s new in Twitter API 1.1?” or “What changed in Twitter API 1.1?” While there’s not much new, a lot has changed and there are several steps you need to take to ensure that you’re still able to access Twitter data after March 5th.

1. OAuth Connection Required
In Twitter API 1.1, access to the API requires authentication using OAuth. To get your Twitter OAuth token, you’ll need to fill out this form.  Note that rate limits will be applied on a per-endpoint, per-OAuth token basis and distributing your requests among multiple IP addresses will not work anymore as a workaround. Requests to the API without OAuth authorization will not return data and will receive a HTTP 410 Gone response.

2. 80% Less Data
In version 1.0, the rate limit on the Twitter Search API was 1 request per second. In Twitter API 1.1, that changes to 1 request per every 5 seconds. A more stark way to put this is that previously you could make 3600 requests/hour but you are now limited to 720 requests/hour for Twitter data. Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits.

3. New Endpoint URLs
Twitter API 1.1 also has new endpoint URLs that you will need to direct your application to in order to access the data. If you try to access the old endpoints, you won’t receive any data and will receive a HTTP 410 Gone response.

4. Hello JSON. Goodbye XML.
Twitter has changed the format in which the data is delivered. In version 1.0 of the Twitter API, data was delivered in XML format. Twitter API 1.1 delivers data in JSON format only. Twitter has been slowly transitioning away from XML starting with the Streaming API and Trend API.  Going forward, all APIs will be using JSON and not XML. The Twitter JSON API is a great step forward as JSON has a much wider standardization than XML does.

All in all, some pretty impactful changes.  If you’re looking for more information, we’ve provided some links below with more details.  If you’re interested in getting full coverage commercial grade access to Twitter data where rate limits are a thing of the past, check out the details of Gnip’s Twitter offerings.  We have a variety of Twitter products, including realtime coverage and volume streams, as well as access to the entire archive of historical Tweets.

Update: Twitter has recently announced that the Twitter REST API v1.0 will officially retire on May 7, 2013. Between now and then they will continue to run blackout tests and those who have not migrated will see interrupted coverage so migrating as soon as possible is highly encouraged.

Helpful Links
Version 1.0 Retirement Post
Version 1.0 Retirement Final Dates
Changes coming in Twitter API 1.1
OAuth Application Form
REST API Version 1.1 Resources
Twitter API 1.1 FAQ
Twitter API 1.1 Discussion
Twitter Error Code Responses

Gnip Cagefight #2: Pumpkin Pie vs. Pecan Pie

Thanksgiving is a time for family gatherings, turkey with all the delicious fixings, football, and let’s not forget, pie! If your family is anything like mine, multiple pie flavors are required to satisfy the differing palates and strong opinions. So we wondered, which pies are people discussing for the holiday? What better way to celebrate and answer that question than with a Gnip Cagefight.

Welcome to the Battle of the Pies!

For those of you that have been in a pie eating contest or had a pie in the face, you know this one will be a fight all the way down to the very last crumb. In one corner (well actually it is the Gnip Octagon so can you really have corners, oh well) we have The Traditionalist, pumpkin pie and in the opposite corner, The New Comer, pecan pie. Without further ado, Ladies and Gentleman, Let’s Get Ready to Rumble, wait wrong sport. Let’s Fight!

Six Social Media Sources, Two Words, One Winner . . . And the Winner Is . . .

 

 Source  Pumpkin Pie  Pecan Pie  Winning Ratio
Pumpkin Pie to Pecan Pie
Twitter X 4:1
Facebook X 5:1
Google+ X 6:1
Newsgator X 3:1
WordPress X 5:1
WordPress Comments X 2:1
Overall +6 Winner! +0 :(

 

We looked at one week’s worth of data across six of the top social media sources and determined that pumpkin pie “takes the cake” (so to speak) across every source.

In this case, it is interesting to point out that in sources like Twitter, Facebook, Google+ and WordPress we see higher winning ratios, while sources that tend to have higher latency such as Newsgator and WordPress Comments were a little more even. Is this because, on further consideration, pecan pie sounds pretty good? Or is it that everyone will have to have two pies and, with pecan as the traditional second, it is highly discussed?

Top Pie Recipes

Even though pumpkin pie was our clear winner, we thought it would be fun to share a few of the most popular holiday pie recipes by social media source:

  1. Twitter – Cook du Jour Gluten-Free Pumpkin Pie and Pecan Pie Video Recipe from joyofcooking.com
  2. Facebook – Ben Starr’s Pumpkin Bourbon Pecan Pie Recipe
  3. Newsgator – BlogHer’s Pumpkin Pecan Roulade with Orange Mascarpone Cream Pie Recipe
  4. WordPress and WordPress Comments – Chocolate Bourbon Pecan Pie from allrecipes.com

Non-Traditional Thanksgiving Pies

Another interesting fact that came out of this Cagefight was the counts of non-traditional Thanksgiving pies that were mentioned across the social media sources we surveyed. Though we rarely find these useful for communicating numerical values effectively, you can’t not have a pie chart in this post.

Happy Thanksgiving!

Gnip Cagefight #1: Beer vs. Wine

Welcome to the very first edition of the Gnip Cagefight! Over the next couple of weeks we’ll select a common word pair to enter the Gnip Octagon to fight to the finish in a no holds barred battle of Tweets. Two words will enter. Only one will leave.

In addition to crowning the victor, we’ll also call out some of the fun, interesting, strange, and bizarre trends that we glean from the data. Leave us a comment with any contenders you’d like to see in the future.

Now without further delay, let’s dive into our first Gnip Cagefight… Put your hands together for Wine vs. Beer!

And the Winner is . . .

We looked at one week of Tweets that contained the words “beer” or “wine,” and beer was the more commonly used term, appearing in 53.1% of those tweets vs. 48.1% for wine. Now you might be saying, “Hey, that’s more than 100%!” You are correct! That’s because beer and wine appear together about 13,801 times–along with an uncomfortable hangover, we presume. (Is this an opportunity to sell aspirin?)

With beer as our victor, we wanted to answer the age old question . . .

What time is Beer Thirty?

To answer this question, we analyzed the volume of Tweets containing the term “beer” throughout each day and averaged that across the week’s worth of data we collected. Each Tweet’s time was moved into the time zone of the Tweeter and normalized against the daily cycle of Tweet volume. Based on the graph below, true beer thirty is 5pm local time. This gives great meaning to the saying “It’s 5 o’clock somewhere.”

Beer Drinkers have a Wider Vocabulary than Wine Drinkers

Another fascinating tidbit that came out of the data was that beer drinkers have a wider vocabulary than wine drinkers. Normalizing for the number of words used, we find that beer drinkers use 14% more distinct words than wine drinkers. Wine drinkers tend to use the same idioms, for example, “glass of wine” or “red wine,” more than beer drinkers use their most common phrases. Does this mean that beer drinkers are 14% smarter than wine drinkers? Or that they use very creative spelling? We won’t wade any further into that question, but you can be the judge.

That’s all for our inaugural Gnip Cagefight. Hope you enjoyed it and be sure to let us know what what words you’d like to see in the octagon in the future.

The VMAs, Lady Gaga and Data Science

Hi everyone. I’m the new Data Scientist here at Gnip. I’ll be analyzing the fascinating data that we have coming from all of our varied social data streams to pull out the stories, both impactful and trivial, that are flowing through social media conversations. I’m still getting up-to-speed but wanted to share one of the first social events that I’ve dug into, the 2011 MTV Video Music Awards.

Check out the info below and let me know in the comments what you think and what you’d like to see more of.  And now, on with the show…

3.6M Tweets Mention “VMA”

The volume of tweets containing “VMA” rose steadily from a few hours before the VMA pre-show was broadcast, up to the starting of the pre-show at 8:00 PM ET (00:00 GMT) and remained fairly strong during the event. It trailed to low volume within the hour after the VMA broadcast ended at 11:15 PM ET (03:15 GMT). Tweets mentioning “VMA” totaled 3.6M during the 7 hours surrounding and including the VMA broadcast.

 

Lady Gaga Steals the “Tweet” Show

The largest volume of tweets for an individual artist are the mentions of “gaga.” Lady Gaga performed early in the show and the surge of tweets during her performance surpassed 35k tweets per minute for about 8 minutes. Again in the second half, Lady Gaga tweet volume briefly jumped above 50k per minute. Tweets mentioning “gaga” totaled 1.8M during the 7 hours surrounding and including the VMA broadcast.

As you can see in the chart below, other artists that garnered significant tweet volumes included Beyonce’, Justin Beiber, Chris Brown, Katy Perry and Kanye West. Perry, West and Brown got a lot of attention during their appearances, while Justin Bieber and Lady Gaga lead the counts in volume by maintaining a fairly steady stream of tweets during the broadcast.

Term Representation of Tweets Sampled
VMA 44 %
Lady Gaga 21 %
Beyonce 16 %
Justin Bieber 10 %
MTV 9.2 %
Chris Brown 8.0 %
Katy Perry 5.6 %
Kanye West 4.8 %
Jonas 3.5 %
Taylor Swift 2.1 %
Rihanna 1.1 %
Eminem 0.55 %
Michael Jackson 0.18 %
Ke$ha 0.17 %
Cher 0.14 %
Paramore 0.12 %

 

 

 

Contrasting, it is interesting to note that Beyonce’ and Chris Brown gained most of their tweet attention around their performances with very larger surges in tweet volume. Beyonce’s volume–another Beyonce’ bump–continues after her performance as twitter users absorb the news of her pregnancy.

 

 

One surprise that emerges from looking for other artists connected to the VMAs was Michael Jackson’s tweet volume. While Jackson gleaned many Retweets after winning the King of the VMA poll, he also received a large number of natural tweets lamenting his passing and celebrating his past successes.

Methodology

The free-form text and limited length of twitter messages creates a number of challenges for monitoring an event via twitter comments. People refer to the event differently and focus on different parts of the event. There will be spelling variations and differences in idioms and nicknames used to describe people and performances. Do we search for “Bieber”,”Beiber” and “Justin”?  Will tweeters use “Beyonce” or Beyonce’”? Knowledge of what we are monitoring is required; preparing tools to adapt things we learn during the events is also essential to getting good results.

One effective strategy is to use one or two tokens to identify tweets related to the event. The objective is to choose terms that we know are related to the event, that won’t be widely used outside the event, and that will give a representative sample–diverse and with sufficient volume. Once we have started to collect the event-focused twitter sample, we can look for relevant terms correlated with the filter term to find out what else people are tweeting about during the event.

Hope you enjoyed this first post. Look for more to come.

 

We're off to Dreamforce!

There’s always a lot going on here at Gnip, but this week is especially packed with the team looking to make a big splash at Salesforce.com’s annual Dreamforce event. Salesforce is obviously a huge player in the software space and the theme of this year’s Dreamforce is “Welcome to the Social Enterprise” which fits really nicely with what we do.

At the conference, we’ll be speaking at two sessions and sponsoring the Hack-a-thon. In the first presentation, Drinking from the Firehose: How Social Data is Changing Business Practices, Jud (@jvaleski) and Chris (@chrismoodycom) will discuss the ways that social data is being used to drive innovation across a variety of industries from Financial Services and Emergency Response to Local Business and Consumer Electronics. They’ll also give a glimpse into the technical challenges involved in handling the ever-increasing volume of data that’s flowing out of Twitter every day. If you’re at Dreamforce, this session is on Tuesday (8/30) from 11am to noon in the DevZone Theater on the 2nd floor of Moscone West.

In the second presentation, Your Guide to Understanding the Twitter API, Rob (@robjohnson) will talk through the best ways to get access to the Twitter data that you’re looking for, examining the pros and cons of the various methods. You can check out Rob’s session on Tuesday (8/30) from 3:00 to 3:30 in the Lightning Forum in the DevZone on the 2nd floor of Moscone West.

And finally, we’re sponsoring the Hack-a-thon where teams of developers will create cloud apps for the social enterprise using Twitter feeds from Gnip and at least one of the Salesforce platforms (Force.com, Heroku, Database.com). The winning team stands to take home at least $10,000 in prize money. We’re really excited to see the creative solutions that the teams develop! All submissions are due no later than 6am on Thursday (9/1), so sign up now and get going!

Want to meet up in person at Dreamforce? Give any of us a shout @jvaleski, @chrismoodycom, @robjohnson, @funkefred.

Handling High-Volume, Realtime, Big Social Data

The social ecosystem has become the pulse of the world. From delivering breaking news like the death of Osama Bin Laden before it hit mainstream media to helping President Obama host the first Twitter Town Hall, the realtime social web is flooded with valuable information just waiting to be analyzed and acted upon. With millions of users and billions of social activities passing through the ever-growing realtime social web each day, it is no wonder that companies need to reevaluate their traditional business models to take advantage of this big social data.But with the exponentially ever-growing social web, massive amounts of data are pouring into and out of social media publishers’ websites and APIs every second. In a talk I gave at GlueCon a couple of months ago, I ran down some math to put things into perspective. The numbers are a little dated, but the impact is the same. At that time there were approximately 155,000,000 Tweets per day and the average size of a Tweet was approximately 2,500 Bytes (keep in mind this could include Retweets).

A Little Bit of Arithmetic

155,000,000 Tweets/day   2,500 Bytes = 387,500,000,000 Bytes/day

387,500,000,000 Bytes/day  24 Hours = 16,145,833,333 Bytes/hour

16,145,833,333 Bytes/hour 60 minutes = 269,097,222 Bytes/minute

269,097,222 Bytes/minute 60 second = 4,484,953 Bytes/second

4,484,953 Bytes/second  1,048,576 Bytes/megabyte = 4.2 Megabytes/second

And in terms of data transfer rates . . .

1 Megabyte/second = 8 Megabits/second

So . . .

4.2 Megabytes/second  8 Megabits/Megabyte = 33.8 Megabits/second

That’s a Lot of Data

So what does this mean for the data consumers, the companies wanting to reevaluate their traditional business models to take advantage of vast amounts of Twitter data? At Gnip we’ve learned that some of the collective industry data processing tools simply don’t work at this scale: out-of-the-box HTTP servers/configs aren’t sufficient to move the data, out-of-the-box config’d TCP stacks can’t deliver this much data, and consumption via typical synchronous GET request handling isn’t applicable. So we’ve built our own proprietary data handling mechanisms to capture and process mass amounts of realtime social data for our clients.

Twitter is just one example. We’re seeing more activity on today’s popular social media platforms and a simultaneous increase in the number of popular social media platforms. We’re dedicated to seamless social data delivery to our enterprise customer base and we’re looking forward to the next data processing challenge.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Guide to the Twitter API – Part 1 of 3: An Introduction to Twitter’s APIs

You may find yourself wondering . . . “What’s the best way to access the Twitter data I need?” Well the answer depends on the type and amount of data you are trying to access.  Given that there are multiple options, we have designed a three part series of blog posts that explain the differences between the coverage the general public can access and the coverage available through Twitter’s resyndication agreement with Gnip. Let’s dive in . .. 

Understanding Twitter’s Public APIs . . . You Mean There is More than One?

In fact, there are three Twitter APIs: the REST API, the Streaming API, and the Search API. Within the world of social media monitoring and social media analytics, we need to focus primarily on the latter two.

  1. Search API - The Twitter Search API is a dedicated API for running searches against the index of recent Tweets
  2. Streaming API – The Twitter Streaming API allows high-throughput, near-realtime access to various subsets of Twitter data (eg. 1% random sampling of Tweets, filtering for up to 400 keywords, etc.)

Whether you get your Twitter data from the Search API, the Streaming API, or through Gnip, only public statuses are available (and NOT protected Tweets). Additionally, before Tweets are made available to both of these APIs and Gnip, Twitter applies a quality filter to weed out spam.

So now that you have a general understanding of Twitter’s APIs . . . stay tuned for Part 2, where we will take a deeper dive into understanding Twitter’s Search API, coming next week…