Pushing and Polling Data Differences in Approach on the Gnip platform

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

New Gnip schema live on demo.gnip.com

We have flipped the switch to allow people to start working with our new schema at http://demo.gnip.com. In addition to standing up the site with the updated schema we have moved over all the existing accounts from the current system, so your existing gnipcentral.com user and password also get you access to the demo system.

The following publishers are in the demo system and we plan to add more over the course of the next month during the beta period.

  • Digg
  • Identi.ca
  • Seesmic
  • Six Apart
  • Twitter

We will be posting additional examples of how we mapped these social media services to the updated Gnip Schema in the Gnip Community and will link to those examples from the blog as well as point to them in our standard release newsletter that will go out later today.

Based on the feedback we have received there is a lot of interest in the enhanced metadata in the new schema that can be used to support additional types of  URLs, multiple tags, geo data, and rich media. Now, go grab some data and do something cool with it.

Solution Spotlight: Storytlr Using Gnip for Real-Time Social Data Integration


Who is Storytlr?
Storytlr provides a life streaming service that allows people to bring together their entire web 2.0 life and assemble their content to tell stories in a whole new way.  Learn more at their website, http://storytlr.com/, or their blog, http://blog.storytlr.com/.

Real-world results Storytlr says they are realizing from using Gnip
Storytlr is using Gnip to provide real-time data integration to Twitter, Digg, Delicious and Seesmic.  Since Storytrl starting using Gnip they have seen a reduction in the latency for the data integration of these social media activity streams (i.e. the time elapsed for a tweet, digg, or event notice to show up in the Storytlr service from a third-party is now real-time). Read more on how Storytlr added real-time integration using Gnip in their recent blog post.

We are looking forward to working more with the Storytlr team as we roll out more publishers that they can take advantage of in their business. 

Newest Publisher on Gnip: Seesmic

Seesmic is the latest publisher to be added to the Gnip Platform.

The initial integration between Gnip and Seesmic allows people to easily filter and integrate the Seesmic firehose or specific user activities into third-party applications and websites using either Gnip Notifications or Gnip Data Streams with the “Actor” rule type. Seesmic pushes to Gnip using an XMPP implementation and we are excited to work with them as they continue to evolve their service.   The Seesmic publisher just went live last week and we already have people integrating real-time notifications and data streams via Gnip, so go grab some Seesmic to join or create your own video conversation!

Learn more about Seesmic from their blog or their website.