Archive for May, 2009

Posted in APIs,Customers,Publishers,Strategy,solutions by Eric No Comments

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

Posted in Uncategorized by Jud No Comments

Jeremy Hinegardner has written a super cool utility (he calls it Snipe) in Ruby that uses Gnip Notifications to optimize your data collection needs. In a nutshell, it digests Gnip Notifications for the Twitter Publisher (though it could obviously be re-purposed for any Publisher) and pings Twitter to retrieve the tweets associated with said Notifications; rounding out Gnip <activity>s. Enjoy, and hats off to Jeremy; well done.

Posted in APIs,Industry,Partners,Publishers,Release by Eric No Comments

We are pleased to announce an early access program for a new Gnip data publisher to access and integrate data from the Facebook Platform Open Streams API.

Companies will realize immediate benefits from choosing to use the Gnip Platform for integrating data from Facebook.

  • Choose the specific Facebook users from among those that have authorized your applications and then Gnip will immediately begin collecting the relevant data, normalize it and deliver it in real-time to your applications.
  • Simplify the integration and data retention requirements for integrating with the Facebook Platform to your applications by using Gnip Notifications and Gnip Data Streams to work with and store either event meta-data or full-data based on the appropriate use case as defined by the Facebook Platform terms of use (i.e. the 24 hour rule, etc)

Developers and companies can sign up right now to be notified when the early access program is launched by sending an email to info@gnip.com with the subject: Facebook.  Any company signing up for the early access program will be eligible for three free months subscription service to the Gnip data publisher for the Facebook Platform once it is generally released.   At this time the early access program is planned to be launched in the summer.

And to provide a small taste of the upcoming integration here are two examples of what common Newsfeed actions on Facebook will look like when accessed via the planned Gnip data publisher.

1) Status update Example (fbids in this example were changed from actual one in my stream item)

<activities publisher=”facebook”>
<activity>
<at>2009-05-16T14:07:25.000Z</at>
<action>post</action>
<activityID>http://www.facebook.com/profile.php?aid=6&id=12345&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Shane-Pearson/12345″>Shane Pearson</actor>
<destinationURL=http://www.facebook.com/profile.php?id=12345&amp;story_fbid=12345</destinationURL>
<payload>
<body>It must be spring as my weekly trip to Lowes/Home Depot is back on the schedule</body>
</payload>
</activity>

2) Upload photo example (the below Gnip data schema maps to a Facebook activity stream example)

<activities publisher=”facebook”>
<activity>
<at>2009-04-06T21:23:00-07:00</at>
<action>upload</action>
<activityID>http://www.facebook.com/album.php?aid=6&id=499225643&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Snapshot-Smith/499225643″>Snapshot Smith</actor>
<destinationURLhttp://www.facebook.com/people/Snapshot-Smith/499225643</destinationURL>\
<payload>
<title>Snapshot Smith uploaded a photo.</title>
<body><p><a href=”http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at” caption=”A very  attractive   wall, indeed”/></a></p>
</body>
<mediaURL type=”thumbnail” > http://photos-e.ak.fbcdn.net/photos-ak-snc1/v2692/195/117/499225643/s499225643_28_6861716.jpg</mediaURL>
<mediaURL type=”content” > http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at<</mediaURL>
</payload>
</activity>

Posted in APIs,Partners,Publishers by Eric No Comments

We are pleased to be announce an agreement with Automattic, Inc. that allows us to add WordPress.com as our newest data publisher in the standard edition of the Gnip platform.

Gnip now provides access to the WordPress XMPP firehose for posts and comments.   The WordPress.com firehose is designed for companies who would like to ingest a real-time stream of new WordPress.com posts and comments the second they get published and access is via subscription only.   For more information contact Gnip at info@gnip.com

Gravitational Shift

11 MAY
2009

Posted in Uncategorized by Jud No Comments

Gnip’s approach to getting more Publishers into the system has evolved. Over the past year we’ve learned a lot about the data delivery business and the state of its technological art. While our core infrastructure remains a highly performant data delivery bus, the way data arrives at Gnip’s front door is shifting.

We set out assuming the industry, at large (both Publishers and Consumers), was tired of highly latent data access. What we’ve learned is that data Consumers (e.g. life-stream aggregators) are indeed weary of the latency, but that many Publishers aren’t as interested in distributing their data in real-time as we initially estimated. So, in order to meet intense Consumer demand to have data delivered in a normalized, minimal latency (not necessarily “real-time”), manner, Gnip is adding many new polled Publishers to its offering.

Checkout http://api.gnip.com and see how many Publishers we have to offer as a result of walking down the polling path.

Our goal remains to “deliver the web’s data,” and while the core Gnip delivery model remains the same, polling has allowed us to greatly expand the list of available Publishers in the system.

Tell us which Publishers/data sources you want Gnip to deliver for you! http://gnip.uservoice.com/

We have a long way to go, but we’re stoked at the rate we’re able to widen our Publisher offering now that our polling infrastructure is coming online.

Posted in APIs,Publishers,Release by Eric No Comments

After running what we believe has been a very complete beta program for the last three months we are ready to officially launch our 2.1 version next week at the end of the day Tuesday, May 12th.

What will happen on May 12th

  1. v2.1 of the Gnip platform available at http://api.gnip.com will become the officially supported version.   Existing customers of the standard version of our product are all being contacted directly via email.   Community version users are being notified by our official newsletter, this blog post and our standard practice of posting to our Twitter account @gnipsupport.
  2. Version 2.0 will be deprecated and continue to be available for 30 days.  Existing users of the http://prod.gnipcentral.com version of the service are encouraged to move to the new version as soon as possible.   The point of the 3 month beta program was to provide time to upgrade to the new Gnip v2.1 data schema.  Read up on the new version at http://www.gnip.com/docs

Posted in APIs,Publishers,Release by Eric No Comments

We just finished making a major upgrade to the beta api.gnip.com environment.  First, thank you to everyone for their patience during our middle of the day upgrade.  We normally schedule upgrades off hours or do a rolling upgrade, but tonight the entire team of ten is going to the Star Trek premiere.   Anyway, we made the “management” decision to do the upgrade earlier in the day so we did not run into our company/family event tonight.

What’s new?   Up until now the data publishers in api.gnip.com have been doing lazy scheduling, which means they would pull data but it was actually easy with a small number of rules to miss data or not have a filter get a hit in our scheduling.  Yeah, that is a beta thing as the primary reason for having the beta out for so long was giving users a chance to get all their existing integrations moved to the new schema.   With Beta 3 we have made some major enhancements system that from what we see in our tests greatly improve the amount of data flow across all our data publishers using the new polling services.

From a timeline standpoint we want to let the new features soak a bit and then we will lock down a date to take the system to production as early as the end of May.     In the next few days we are running diagnostics and scaling out the system by putting it through the paces, so feel free to do the same or just work with our notification streams on any given publisher.

Live long and prosper.  (sorry, just could not resist the Star Trek line)

Posted in Industry,Strategy by Eric No Comments

We have been thinking a lot about user-generated content over the last few weeks and months as we reflect on how developers are using the Gnip platform to build solutions we never imagined when the company was started.

One of the great things about getting out and talking to lots of people is that we are always learning more about how we fit, or how other people think we fit, into the ecosystems that make up the Internet.    Recently we realized that we really only exist because of the entire phenomena of user-generated content.  This is not the core idea with which we started the company.  The core idea was really techy (see our FAQ)  In addition to that original techy idea now it obvious to us that the primary Gnip mission has to also focus on making user-generated content universally accessible and useful as it was intended by the original author who shared the content.

We still see Gnip providing innovative and bleeding edge technology solutions for social and business data integration, but by realizing that out mission also must include thinking about the people sharing the content in the first place impacts how we prioritize everything we do.

Do not worry, we are not going to start building web apps and become another social media aggregation solution (those our some of our partners and customers, and we love them all).  Instead we are that much more excited to focus on the underlying platform for anyone who wants to integrate user-generated content.

What this most does for our team is help us understand that in addition to providing a great platform for developers to do data integration we also have to help developers using the Gnip platform to access and integrate data in a way that will uphold the original intent of the user who shared the content in the first place. Where something was originally shared and how it was shared does matter, and this is why our new schema goes so far with the inclusion of original destination URLs, author profile information, regarding URLs and other pertinent information that can be lost when people are just passing links, comments or re-tweets around or hacking together social data.

So, now when we say Gnip is focused on Delivering the Web’s Data we will be thinking about developers and the people everywhere who are using the Internet to just tell the world something in their own way.