PubSubHubbub (PuSH), Google and Buzz

Setting the quality, validity, and longevity of Google Buzz as a product aside, here’s a first reaction to its PubSubHubbub based API.

I love the pubsub model, because driving applications via events, vs. polling, is almost always advantageous, and certainly more efficient. Gnip has a chapter in O’Reilly’s Beautiful Data wherein we go deeper into why the world should be event driven rather than founded on incessant polling.. bslatkin, also has a good post on the topic (Why Polling Sucks).

Over the past few days we’ve built Google Buzz support into the Gnip offering, which has allowed me to finally dig into PuSH Subscription at the implementation level. Mike Barinek, previously with Gnip, built a Ruby PuSH hub, but I haven’t gone that deep yet.

Some PuSH Subscriber thoughts…

  • PuSH lacks support for batch topic subscription requests. This is a bummer when your customers want to subscribe to large numbers of topics, as you have to one-off each subscription request. Unfortunately, I don’t see an easy way to extend the protocol to allow for batching, as the request acknowledgment semantics are baked into the HTTP response code itself, rather than a more verbose HTTP body.
  • Simple and lightweight. As far as pubsub protocols go, PuSH is nice and neat. Good leverage, and definition, of how HTTP should be used to communicate the bare minimum. While in the bullet above I complain that I want some expandability on this front, which would pollute things a bit, the simplicity of the protocol can’t be reckoned with.
  • Google’s Hub
    • Happily accepts, and returns success for, batch topic subscription requests, when in fact all topics aren’t actually subscribed. Bug.
    • Is the most consistent app I’ve seen WRT predicable HTTP interaction patterns. Respectfully sends back 503/retry-afters when it needs to, and honors them. I wish I could say this about a dozen other HTTP interfaces I have to interact with.
    • Is fast to field subscription requests. However, the queue on the back that shuffles events through the system has proven inconsistent and flaky. I don’t think I’ve lost any data, but the latency and order in which events move through it isn’t as consistent as I’d like. In order for event driven architectures to work, this needs to be tightened up.

Here’s to event driven systems!