PubSubHubbub (PuSH), Google and Buzz

February 18th, 2010
  • Posted by Jud Valeski, Co-Founder and CEO
6 Comments

Setting the quality, validity, and longevity of Google Buzz as a product aside, here’s a first reaction to its PubSubHubbub based API.

I love the pubsub model, because driving applications via events, vs. polling, is almost always advantageous, and certainly more efficient. Gnip has a chapter in O’Reilly’s Beautiful Data wherein we go deeper into why the world should be event driven rather than founded on incessant polling.. bslatkin, also has a good post on the topic (Why Polling Sucks).

Over the past few days we’ve built Google Buzz support into the Gnip offering, which has allowed me to finally dig into PuSH Subscription at the implementation level. Mike Barinek, previously with Gnip, built a Ruby PuSH hub, but I haven’t gone that deep yet.

Some PuSH Subscriber thoughts…

  • PuSH lacks support for batch topic subscription requests. This is a bummer when your customers want to subscribe to large numbers of topics, as you have to one-off each subscription request. Unfortunately, I don’t see an easy way to extend the protocol to allow for batching, as the request acknowledgment semantics are baked into the HTTP response code itself, rather than a more verbose HTTP body.
  • Simple and lightweight. As far as pubsub protocols go, PuSH is nice and neat. Good leverage, and definition, of how HTTP should be used to communicate the bare minimum. While in the bullet above I complain that I want some expandability on this front, which would pollute things a bit, the simplicity of the protocol can’t be reckoned with.
  • Google’s Hub
    • Happily accepts, and returns success for, batch topic subscription requests, when in fact all topics aren’t actually subscribed. Bug.
    • Is the most consistent app I’ve seen WRT predicable HTTP interaction patterns. Respectfully sends back 503/retry-afters when it needs to, and honors them. I wish I could say this about a dozen other HTTP interfaces I have to interact with.
    • Is fast to field subscription requests. However, the queue on the back that shuffles events through the system has proven inconsistent and flaky. I don’t think I’ve lost any data, but the latency and order in which events move through it isn’t as consistent as I’d like. In order for event driven architectures to work, this needs to be tightened up.

Here’s to event driven systems!

6 Comments

  • Josh Fraser said:

    February 18, 2010 at 12:00 pm

    awesome. i’m stoked to see you guys poking around with this. it’s a great direction for gnip. i like it!

    yeah it would be nice if PuSH supported bulk subscriptions, but from what i understand they decided to leave this out of the core spec because it’s actually kindy tricky to implement on the hub (although i can’t say for sure). at eventvue we used multi-curl to send the requests in parallel. it may not be ideal, but it worked pretty well.

    have you filed an official bug yet about the returned success code for batch requests? if not, i’m happy to add it.

    thanks for posting this. hopefully someone else can chip in about the reason multiple subscriptions aren’t supported right now.

  • Jud said:

    February 18, 2010 at 12:10 pm

    thanks for the feedback! I took a quick glance at where to submit a bug, but didn’t find the right place. pls either fwd me the URL, or submit; either way’s great for me.

  • Josh Fraser said:

    February 18, 2010 at 12:14 pm

    http://code.google.com/p/pubsubhubbub/issues/entry

    i’ll leave you to add it since i’ve not taken the time to duplicate the issue myself.

  • February 18, 2010 at 12:28 pm

    Hey Jud,

    Thanks for the post and the feedback!

    Batch subscription is something people have requested before; we’ve left it out because we’re trying to keep the core spec as simple as possible (do one thing, do it well). As Josh said, it’s easy to issue lots of subscription requests in parallel.

    Going forward I think the ideal solution to this is some kind of “firehose” subscription convention in the Hubbub spec. I know that Ilya (http://igvita.com/) and others really want this as well, and overall it would be cheaper/easier for hubs to maintain (as long as the data is 1. all public, and 2. the subscriber is fast). There’s been a bit of discussion on the PubSubHubbub mailing list about this and hopefully it’s something we’ll iron out in the next couple months.

    Otherwise, could you describe further the latency inconsistencies you’re seeing for event delivery? Was this for all feed sources or only Buzz? We have end-to-end probes for verifying publishing/delivery latency in the hub over time and it’s been pretty solid. Let me know how I can help track down what you’re seeing.

    -Brett

  • Eric Marcoullier said:

    February 18, 2010 at 10:33 pm

    Glad to see that there’s talk about a “firehose” subscription on the list.

    +1 to that idea.

  • Jud said:

    February 19, 2010 at 9:36 am

    Thanks for the perspective on the batch stuff. Yea, I’m of mixed mind on it. It’s convenient and expected on the PuSH subscriber front, *but* it breaks some of the nice and flat elegance of the hub impl for handling subscriptions. One of the nice things about PuSH is that it distributes that load out to the subscriber.

    As for the inconsistencies in event delivery. Stoked you guys have introspection into the pipeline; nicely done. Buzz only, and I suspect they’re PuSH Publisher (e.g. Buzz) related. I can only speak on a macro level at the moment, but I’m seeing traffic patterns (event delivery) that aren’t what I’d expect. Put another way, spikes at times that don’t appear to follow “normal” social messaging application usage patterns; thus, delivery feels off. When I get to more honed analysis and can point to a specific, I’ll let you know what I find.

Follow Gnip

Archive

Recent Posts
Categories
Tags
Blogroll

Recent Tweets

  • # {New Product Feature} Enhanced Filtering for PowerTrack http://t.co/zVgJUY6H More precise filtering options for the Twitter firehose!
  • # Feasting on whale carcasses http://t.co/espZtpNL Twitter and Facebook, Why Twitter Might Be Worth More In The Long Run @pointsnfigures
  • # You learn something new every day http://t.co/oWsf08om - 8 Crazy Things IBM Scientists Have Learned Studying Twitter
  • # Full firehoses that ensure 100% coverage in realtime http://t.co/R03nlExx More details on our partnership with Automattic on the @gnip blog
  • # Likes from WordPress & IntenseDebate now available http://t.co/kRoBM2W4 "Automattic is an important source in the social data mix" @radian6

Switch to our mobile site