Response Code Nuances

While fixing a bug yesterday, I plowed through the code that does Gnip’s HTTP response code special case handling. The scenarios we’re handling illustrate the complexities around doing integrations with many web APIs. It was a reminder of how much we all want standards to work, and how often they only partially do so. Here are a few nuances you should consider if you’re doing API integrations by hand.

“retry-after”

When doing a polling based integration with a “real-time” API, you’re inclined to poll it a lot. That has caused some service providers to tell you to slow down using the “retry-after” HTTP header. Some providers use other, not so standard, ways to cool you down, but those are beyond the scope of this post. When you get a non-200-level response back from a server, you should consider looking for the retry-after header, regardless of whether or not it was a 503 or 300-level code (per HTTP 1.1 specification). Generally, when a services sends a retry-after, they’re intention behind it is clear, and you should respect the value that comes back. Now, the format of that value can be either “seconds”, or in a more verbose time format that tells you when you should wait “until” before trying the request again. In practice, we’ve never seen the latter; only the “seconds” version. When we see retry-after, we sleep that duration; you should probably do the same.

HTTP Response-code ’999′

You can look for it in the spec, but you won’t find it. Delicious likes to send a ’999′ back when you’re hitting them too hard. Consider backing off for several minutes if you see this from them.

non-200 HTTP Response Bodies

While many services don’t bother sending response bodies back for non-200s (and those that do often don’t provide anything actionable), many do. It’s a good idea to write those bodies to a log file (or at least the first n-hundred bytes) for human inspection. There can be some useful information in there to help you build a more effective and efficient integration.

The matrix of services-to-response codes, and how you should respond to them, is big. The above is just a small slice of the scenarios your integrations will encounter, and that you’ll need to solve for.

While a service’s documentation is always some degree out of date, and you can only truly learn the behavioral characteristics through long nights of debugging, here are some pointers to service specific response codes that you might find useful.

Pushing and Polling Data Differences in Approach on the Gnip platform

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

New Publishers in demo.gnip.com

With our schema now finalized and in beta at http://demo.gnip.com and the crowd-sourcing application launched to help us prioritize our publisher integration schedule the team is now heads down building out more publishers on the Gnip platform.

Today we put nine ten new publishers into demo.gnip.com.   All of these are using the updated schema and provide support for notifications and activities with full-data.  Have fun integrating some data!

  1. Delicious
  2. Fotolog
  3. Plurk
  4. Reddit
  5. Slideshare (added after original blog post)
  6. Stumbleupon
  7. Tumblr
  8. Twitter-search
  9. Vimeo
  10. Webshots

Solution Spotlight: Storytlr Using Gnip for Real-Time Social Data Integration

Storytlr

Who is Storytlr?
Storytlr provides a life streaming service that allows people to bring together their entire web 2.0 life and assemble their content to tell stories in a whole new way.  Learn more at their website, http://storytlr.com/, or their blog, http://blog.storytlr.com/.

Real-world results Storytlr says they are realizing from using Gnip
Storytlr is using Gnip to provide real-time data integration to Twitter, Digg, Delicious and Seesmic.  Since Storytrl starting using Gnip they have seen a reduction in the latency for the data integration of these social media activity streams (i.e. the time elapsed for a tweet, digg, or event notice to show up in the Storytlr service from a third-party is now real-time). Read more on how Storytlr added real-time integration using Gnip in their recent blog post.

We are looking forward to working more with the Storytlr team as we roll out more publishers that they can take advantage of in their business. 

Solution Spotlight: Soup.io is Now Using Gnip

Soup.io is now using the Gnip messaging platform for their web API data integration needs. Welcome Soup.io!

Who is Soup.io?
Soup.io provides a easy to use micro-blogging and lifestream service that serves as an aggregator for your public social media feeds. Visit their website at http://www.soup.io/or their blog at http://updates.soup.io/ to learn more.

Real-world results Soup.io says they are realizing from using Gnip
Soup.io is using Gnip to provide data integration to Twitter, and they have seen a reduction in the latency for their Twitter integration (i.e. the time elapsed for a tweet to show up in the Soup.io service) since moving to Gnip. Now Soup.io users should see their Twitter notices show up within a minute of them being sent on the Twitter service. Since Gnip also provides data streams from many other providers as well (Flickr, Delicious, etc) Soup.io is working to use Gnip as the way to access and integrate to these services in the future.

Solution Spotlight: Strands Now Using Gnip

Strands is the newest company using the Gnip messaging platform for their web API data integration needs. Welcome Strands and thank you to Aaron for sharing what the team is doing!

Who is Strands?
Strands develops technologies to better understand people’s taste and help them discover things they like and didn’t know about. Strands has created a social recommendation engine that is able to provide real-time recommendations of products and services through computers, mobile phones and other Internet-connected devices. This enables users to discover new things, based on their online, offline and mobile activities. The Strands.com website helps people discover new things from other people. Visit http://www.strands.com to learn more.

Real-world results Strands says they are realizing from using Gnip
Strands.com is now able to give people updates faster and more reliably. In addition, Strands has seen reduced load on their system by not having to poll for updates on sites like Twitter, Flickr, Delicious, and Digg. Gnip allows Strands to receive push data from several of these sites, and at a minimum receive notifications when a user on these sites has made an update.

More Examples of How Companies are Using Gnip

We have noticed that we are interacting with two distinct groups of companies; those who instantly understand what Gnip does and those that struggle with what we do, so we decided to provide a few detailed real-world examples of the companies we are actively working with to provide data integration and messaging services today.

First, we are not an end-user facing social aggregation application. (We repeat this often.) We see a lot of people wanting to put Gnip in that bucket along with social content aggregators like FriendFeed, Plaxo and many others. These content aggregators are destination web sites that provide utility to end users by giving them flexibility to bring their social graph or part of their graph together in one place. Also, many of these services are now providing web APIs that allow people to use an alternative client to interact with their core services around status updates and conversations as well other features specific to the service.

Gnip is an infrastructure service and specifically we provide an extensible messaging system that allows companies to more easily access, filter and integrate data from web based APIs. While someone could use Gnip as a way to bring content into a personal social media client they want to write for a specific social aggregator it is not something we are focused. Below are the company use cases we are focused:

  1. Social content aggregators: One of the main reasons we started Gnip was to solve the problems being caused by the point-to-point integration issues that were springing up with the increase of user generated content and corresponding open web APIs. We believe that any developer who has written a poller once, twice, or to their nth API will tell you how unproductive it is to write and maintain this code. However, writing one-off pollers has become a necessary evil for many companies since the content aggregators need to provide access to as many external services as possible for their end users. Plaxo, who recently integrated to Gnip as a way to support their Plaxo Pulse feature is a perfect example, as are several other companies.
  2. Business specific applications: Another main reason we started Gnip was that we believe more and more companies are seeing the value of integrating business and social data as a way to add additional compelling value to their own applications. There are a very wide set of examples, such as how Eventvue uses Gnip as a way to integrate Twitter streams into their online conference community solution, and the companies we have talked to about how they can use Gnip to integrate web-based data to power everything from sales dashboards to customer service portals.
  3. Content producers: Today, Gnip offers value to content producers by providing developers an alternative tool that can be used to integrate to their web APIs. We are working with many producers, such as Digg, Delicious, Identi.ca, and Twitter, and plan to continue to grow the producers available aggressively. The benefits that producers see from working with Gnip include off-loading direct traffic to their web apis as well as providing another channel to make their content available. We are also working very hard to add new capabilities for producers, which includes plans to provide more detailed analytics on how their data is consumed and evaluating publishing features that could allow producers to define their own filters and target service endpoints and web sites where they want to push relevant data for their own business needs.
  4. Market and brand research companies: We are working with several companies that provide market research and brand analysis. These companies see Gnip as an easy way to aggregate social media data to be included in their brand and market analysis client services.

Hopefully this set of company profiles helps provide more context on the areas we are focused and the typical companies we are working with everyday. If your company does something that does not fit in these four areas and is using our services please send me a note.

Web APIs of All Shapes and Sizes

Not all APIs have the same capabilities and therefore they provide different levels of access to events, procedures and data. Seems obvious, but you would not think that based on the normal questions we see from people. In fact we have found that APIs can be like a lot like apples and oranges. So, with the number of available APIs growing, at a rate that can be more than 60 per month we thought people would benefit from some simple way to think of API categorization based on how they expose events and data.

We work with a large variety of APIs from a variety of service providers and have noticed that most APIs fall into a few descriptive types based on how they expose events and data. The following are the main ways we are starting to look at APIs.

  • Fire hose or “full stream”. Identi.ca and Twitter are two examples, but Flickr also has a fire hose
  • User-based stream: These services do not directly expose a full stream, but instead give people a way to assemble an aggregate stream based on a list of users. Flickr again is a good example and there are many others.
  • Activity-based Tag-based and “other”: The main way to work with these services is usually some defined activity (tag, bookmark, etc) access to information or pre-defined streams based on feeds. An example would be Delicious, which allows multiple methods to access information by APIs and feeds.

This bi-frication in API types is something people should keep in mind when they want to access a service for some specific need. If you need to get events and data for a specific need then obviously the behavior of the API is going to impact your approach. And of course here at Gnip we are hard at work trying to provide consistent approaches across all types of APIs, so back to work!

What We Are Up to At Gnip

As the newest member of the Gnip team I have noticed that people are asking a lot of the same questions about what we are doing at Gnip and what are the ways people can use our services in their business.

What we do

Gnip provides an extensible messaging platform that allows for the publishing or subscribing of events and data from across the Internet, which makes data portability exponentially less painful and more automatic once it is set up. Because Gnip is being built as a platform of capabilities and not a web application the core services are instantly useful for multiple scenarios, including data producers, data consumers and any custom web applications. Gnip already is being used with many of the most popular Internet data sources, including Twitter, Delicious, Flickr, Digg, and Plaxo.

How to use Gnip

So, who is the target user of Gnip? It is a developer, as the platform is not a consumer-oriented web application, but a set of services meant to be used by a developer or an IT department for a set of core use cases.

  • Data Consumers: You’ve built your pollers, let us tell you when and where to fire them. Avoid throttling and decrease latency from hours to seconds.
  • Data Producers: Push your data to us and reduce API traffic by an order of magnitude while increasing distribution through aggregators.
  • Custom web applications: You want to embed or publish content to be used in your own application or for a third-party application. Decide who, or what, you care about for any Publisher, give us an end-point, and we push the data to you so you can solve your business use cases, such as customer service websites, corporate websites, blogs, or any web application.

Get started now

By leveraging the Gnip APIs, developers can easily design reusable services, such as, push-based notifications, smart filters and data streams that can be used for all your web applications to make them better. Are you a developer? Give the new 2.0 version a try!