PubSubHubbub (PuSH), Google and Buzz

Setting the quality, validity, and longevity of Google Buzz as a product aside, here’s a first reaction to its PubSubHubbub based API.

I love the pubsub model, because driving applications via events, vs. polling, is almost always advantageous, and certainly more efficient. Gnip has a chapter in O’Reilly’s Beautiful Data wherein we go deeper into why the world should be event driven rather than founded on incessant polling.. bslatkin, also has a good post on the topic (Why Polling Sucks).

Over the past few days we’ve built Google Buzz support into the Gnip offering, which has allowed me to finally dig into PuSH Subscription at the implementation level. Mike Barinek, previously with Gnip, built a Ruby PuSH hub, but I haven’t gone that deep yet.

Some PuSH Subscriber thoughts…

  • PuSH lacks support for batch topic subscription requests. This is a bummer when your customers want to subscribe to large numbers of topics, as you have to one-off each subscription request. Unfortunately, I don’t see an easy way to extend the protocol to allow for batching, as the request acknowledgment semantics are baked into the HTTP response code itself, rather than a more verbose HTTP body.
  • Simple and lightweight. As far as pubsub protocols go, PuSH is nice and neat. Good leverage, and definition, of how HTTP should be used to communicate the bare minimum. While in the bullet above I complain that I want some expandability on this front, which would pollute things a bit, the simplicity of the protocol can’t be reckoned with.
  • Google’s Hub
    • Happily accepts, and returns success for, batch topic subscription requests, when in fact all topics aren’t actually subscribed. Bug.
    • Is the most consistent app I’ve seen WRT predicable HTTP interaction patterns. Respectfully sends back 503/retry-afters when it needs to, and honors them. I wish I could say this about a dozen other HTTP interfaces I have to interact with.
    • Is fast to field subscription requests. However, the queue on the back that shuffles events through the system has proven inconsistent and flaky. I don’t think I’ve lost any data, but the latency and order in which events move through it isn’t as consistent as I’d like. In order for event driven architectures to work, this needs to be tightened up.

Here’s to event driven systems!

  • http://www.onlineaspect.com Josh Fraser

    awesome. i’m stoked to see you guys poking around with this. it’s a great direction for gnip. i like it!

    yeah it would be nice if PuSH supported bulk subscriptions, but from what i understand they decided to leave this out of the core spec because it’s actually kindy tricky to implement on the hub (although i can’t say for sure). at eventvue we used multi-curl to send the requests in parallel. it may not be ideal, but it worked pretty well.

    have you filed an official bug yet about the returned success code for batch requests? if not, i’m happy to add it.

    thanks for posting this. hopefully someone else can chip in about the reason multiple subscriptions aren’t supported right now.

  • http://one.valeski.org Jud

    thanks for the feedback! I took a quick glance at where to submit a bug, but didn’t find the right place. pls either fwd me the URL, or submit; either way’s great for me.

  • http://www.onlineaspect.com Josh Fraser

    http://code.google.com/p/pubsubhubbub/issues/entry

    i’ll leave you to add it since i’ve not taken the time to duplicate the issue myself.

  • http://onebigfluke.com Brett Slatkin

    Hey Jud,

    Thanks for the post and the feedback!

    Batch subscription is something people have requested before; we’ve left it out because we’re trying to keep the core spec as simple as possible (do one thing, do it well). As Josh said, it’s easy to issue lots of subscription requests in parallel.

    Going forward I think the ideal solution to this is some kind of “firehose” subscription convention in the Hubbub spec. I know that Ilya (http://igvita.com/) and others really want this as well, and overall it would be cheaper/easier for hubs to maintain (as long as the data is 1. all public, and 2. the subscriber is fast). There’s been a bit of discussion on the PubSubHubbub mailing list about this and hopefully it’s something we’ll iron out in the next couple months.

    Otherwise, could you describe further the latency inconsistencies you’re seeing for event delivery? Was this for all feed sources or only Buzz? We have end-to-end probes for verifying publishing/delivery latency in the hub over time and it’s been pretty solid. Let me know how I can help track down what you’re seeing.

    -Brett

    • Eric Marcoullier

      Glad to see that there’s talk about a “firehose” subscription on the list.

      +1 to that idea.

    • http://one.valeski.org Jud

      Thanks for the perspective on the batch stuff. Yea, I’m of mixed mind on it. It’s convenient and expected on the PuSH subscriber front, *but* it breaks some of the nice and flat elegance of the hub impl for handling subscriptions. One of the nice things about PuSH is that it distributes that load out to the subscriber.

      As for the inconsistencies in event delivery. Stoked you guys have introspection into the pipeline; nicely done. Buzz only, and I suspect they’re PuSH Publisher (e.g. Buzz) related. I can only speak on a macro level at the moment, but I’m seeing traffic patterns (event delivery) that aren’t what I’d expect. Put another way, spikes at times that don’t appear to follow “normal” social messaging application usage patterns; thus, delivery feels off. When I get to more honed analysis and can point to a specific, I’ll let you know what I find.