Posts Tagged ‘data streams’

Posted in APIs,Customers,Release by Eric No Comments

The Gnip Platform originally was built to support accessing public services and data.  In response to customer requests we soft launched support for authenticated data services over the summer and now we have fully rolled out the new service.   The difference between public and authenticated data services seems trivial, but in practice the differences are very important since authenticated services represent either business level arrangements between companies or private data access.   The new Gnip capabilities supports both of these scenarios.

As part of the new service Gnip also provides dedicated integration capacity for companies as we now are able to segment individually managed nodes on our platform for specific company accounts.   This means that a company with a developer key on Flickr, a whitelist account on Twitter, an application key on Facebook and a developer key on YouTube receives dedicated capacity on the Gnip platform to support all their data integration requirements.

Gnip will also continue to maintain the existing public data integration services which do not require authentication for access and distribution, and we expect most companies with use a blend of our data integration services.

Using the new support for authenticated data service requires contacting us at sales@gnip.com so we can enable your account. Please contact us today to leverage your existing whitelisting or authenticated account on Flickr, YouTube, Twitter or other APIs and feeds.

Posted in APIs,Customers,Partners,Publishers by Eric No Comments

When we started Gnip last year Twitter was among the first group of companies that understood the data integration problems we were trying to solve for developers and companies.   Because Gnip and Twitter were able to work together it has been possible to access and integrate data from Twitter by using the Gnip platform since last July using Gnip Notifications, and since last September using Gnip Data Activities.

All of this data access was the result of Gnip working with the Twitter XMPP “firehose” API to provide Twitter data access for users of both the Gnip Community and Standard edition product offerings.   Recently Twitter announced a new Streaming API and began an alpha program to start making the new API available.  Gnip has been testing the new Streaming API and now we are planning to move from the current XMPP API to the new Streaming API in the middle of June.    This transition to the new Streaming API will mean some changes in the default behavior and ability to access Twitter data as described below

New Streaming API Transition Highlights

  1. Gnip will now be able to provide both Gnip Notifications and Gnip Data Activities to all users of the Gnip platform.   We had stopped providing access to Data Activities to new customers last November when Twitter began working on the new API, but now all users of the Gnip platform can use either Notifications or Data Activities based on what is appropriate for their application use case.
  2. There are no changes to the Gnip API or service endpoints of Gnip Publishers and Filters due to this transition.  This is changing the default Twitter API that we integrate to for data from Twitter (added about 2 hours after original post)
  3. The Twitter Streaming API is meant to accommodate a class of applications that require near-real-time access to Twitter public statuses and is provided with several tiers of streaming API methods.  See the Twitter documentation for more information.
  4. The default Streaming API tiers that Gnip will be making available are the new “spritzer” and “follow” stream methods.   These are the only tiers which are made available publicly without requiring an end user agreement directly with Twitter at this time.
  5. The “spritzer” stream method is not a “firehose” as the XMPP stream that Gnip previously used as our default.   The average messages per second is still being worked out by Twitter, but at this time “spritzer” runs in the ballpark of 10-20 messages per second and can vary depending on lots of variables being managed by Twitter.
  6. The “follow” stream method returns public statuses from a specified set of users, by ID.
  7. For more on “spritzer”, “follow”, and other methods see the Twitter Streaming API Documentation.
What About Companies and Developers With Use Cases Are Not Met With the Twitter “Spritzer” and “Follow” Streaming API methods
Gnip and Twitter realize that many use cases exist for how companies want to use Twitter data and that new applications are being built everyday.   Therefore we are exploring how companies that are authorized by Twitter for other Streaming API methods  would be able to use the Gnip platform as their integration platform of choice.

Twitter has several additional Streaming API methods available to approved parties that require a signed agreement to access.   To better understand which developers and companies using the Gnip platform could benefit from these other Streaming API options we would encourage Gnip platform users to take this short 12 question survey: Gnip: Twitter Data Publisher Survey (URL: http://www.surveymonkey.com/s.aspx?sm=dQEkfMN15NyzWpu9sUgzhw_3d_3d)

What About the Gnip Twitter-search Data Publisher?
The Gnip Twitter-search Data Publisher is not impacted by the transition to the new Twitter Streaming API since it is implemented using the new Gnip Polling Service and provides keyword-based data integration to the search.twitter APIs.

We will provide more information when we lock down the actual day for the transition shortly.    Please take the survey and as always please contact us directly at info@gnip.com or send me a direct email at shane@gnip.com

Posted in APIs,Customers,Publishers,Strategy,solutions by Eric No Comments

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

Posted in APIs,Customers,Long,Publishers,Strategy,solutions by Eric No Comments

This is one people have asked about a lot.   We just pushed out a new publisher today for Flickr.

The new Flickr Publisher supports the Gnip TAG rule-type and allows people to easily integrate data from the Flickr API using the Gnip platform.     In the near future we plan to add support for the Gnip ACTOR rule-type, so stay tuned.   In the mean time it is very easy to define the tags that match your interests.  Not sure what tags to use, just check out some of the most popular tags being used on Flickr.

Check it out on http://api.gnip.com and go use Gnip to integrate some data from Flickr!

Posted in APIs by Eric No Comments

Not all APIs have the same capabilities and therefore they provide different levels of access to events, procedures and data. Seems obvious, but you would not think that based on the normal questions we see from people. In fact we have found that APIs can be like a lot like apples and oranges. So, with the number of available APIs growing, at a rate that can be more than 60 per month we thought people would benefit from some simple way to think of API categorization based on how they expose events and data.

We work with a large variety of APIs from a variety of service providers and have noticed that most APIs fall into a few descriptive types based on how they expose events and data. The following are the main ways we are starting to look at APIs.

  • Fire hose or “full stream”. Identi.ca and Twitter are two examples, but Flickr also has a fire hose
  • User-based stream: These services do not directly expose a full stream, but instead give people a way to assemble an aggregate stream based on a list of users. Flickr again is a good example and there are many others.
  • Activity-based Tag-based and “other”: The main way to work with these services is usually some defined activity (tag, bookmark, etc) access to information or pre-defined streams based on feeds. An example would be Delicious, which allows multiple methods to access information by APIs and feeds.

This bi-frication in API types is something people should keep in mind when they want to access a service for some specific need. If you need to get events and data for a specific need then obviously the behavior of the API is going to impact your approach. And of course here at Gnip we are hard at work trying to provide consistent approaches across all types of APIs, so back to work!