That Twitter Thing

Oh, crap, Eric’s gone and written another long post…

Since we publicly launched Gnip last week, we’ve been asked numerous times if we can integrate with Twitter or somehow help Twitter with the scaling issues they are facing.  We can, but we depend on Twitter giving us access to their XMPP feed.

We are huge fans of Twitter so we’re patiently waiting for that access.  In the mean time, the questions we’ve received have prompted us to explain two things: (1) How we would benefit Twitter and anyone who wants access to Twitter data and (2) Why – if you are a web service – it’s worth integrating now with Gnip rather than waiting either for (a) Gnip to integrate with Twitter or (b) you to get as popular as Twitter and have scale issues.

Let’s address the first issue: How we would benefit Twitter and anyone that wants to integrate with Twitter data.

Twitter has found that XMPP doesn’t scale for them and as a result, people are forced to poll their API *a lot* to get updates for their users.  MyBlogLog has over 25,000 Twitter users that they throw against the Twitter API every 15 minutes.  This results in nearly 2.5 million queries against the API every day, for maybe 250K updates.  Now add millions of pings from Plaxo and SocialThing and Lijit and heaven forbid Yahoo starts beating up their API…

If Twitter starts pushing updates to us, via our dead simple API or Atom or their XMPP server, we can immediately reduce by an order of magnitude the number of requests that some very large sites are making against their API.  At the same time, we reduce the latency between when someone Tweets and when it shows up on consuming sites like Plaxo.  From 15 minutes or more to 60 seconds or less.

We expect that Twitter has their collective heads down and are working around the clock to buttress their infrastructure, and it’s unlikely that they’re going to do anything optional until that’s sorted out.  Unfortunately, “integrate with Gnip” probably falls into the optional category. We expect, however, that at some point Twitter will start opening up their data to more partners once they feel like they have their arms around their infrastructure.

If you run a web service and integrate with Gnip today, you’ll automatically be able to integrate with Twitter data once they give us access.  Presumably you won’t have to wait in line to get direct Twitter integration.  In addition, you’ll have immediate access to all of the other data providers that we integrate with. Such as  Delicious, Flickr, Magnolia, Get Satisfaction, Intense Debate and Six Apart.  For example, only took Brightkite 15 minutes to integrate our API and start pushing data to our partners via us.

Now for the second topic.  Why – if you are a web service – it’s worth integrating with Gnip now rather than waiting either for (a) Gnip to integrate with Twitter or (b) you to get as popular as Twitter and have scale issues.

All things considered, it’s best not to end up in Twitter’s position.  They have a ton of passionate users (I’m one of them) who want reliable service and don’t have infinite patience.  The old startup cliche of “these are problems we’d like to have” is carp.

You don’t want to be in the position where your business suddenly takes off and your infrastructure falls over because people are banging your APIs to death.  You don’t want your most passionate users calling for mass exodus.  It’s better to take a few minutes to start pushing notifications to Gnip now than when you’re doing 20-hour days rebooting servers.

You also don’t want to be in the position that your company takes off and you suddenly get throttled by an API provider.  Nothing is worse than have to pull data sources because you’ve over-polled and the host decides to turn off the spigot.  Start pulling notifications from Gnip and feel secure that you’re only asking for data when there’s something new.

I still use Twitter every day.  Don’t try to kid me; I know you still do too.  Let them get on with their work and rest assured that we’ll integrate with them the instant we get the okay from them.

The WHY of Gnip: Stop Building What Everyone Else is Building

Let me say this up front:

I have a tendency to ramble. Why use a sentence when a paragraph will suffice, right? As a result, I limit myself to 100 word posts on my sporadically updated personal blog. I’ll follow suit here, with only occasional excursions into longer territory. This is one such post.

I’ll try not to ramble too much…

Data portability, the ability to create content on one web site and derive value from it on other sites and applications, has become one of the defining characteristics of what is commonly referred to as “Web 2.0″. An emerging class of services are taking advantage of this data to create entirely new products, including social aggregators (Plaxo Pulse, MyBlogLog, FriendFeed), social search (Lijit, Delver) and communications dashboards (Fuser, Orgoo, Digsby). Each of these services is predicated on the belief that user-generated content is the raw material upon which great companies can be built.

Data portability, via RSS or ATOM or XMPP or open APIs is neither difficult nor complex. These are known problems with straightforward solutions and open standards. But each connection between two services (e.g. MyBlogLog and Flickr or Plaxo and Digg) is a custom integration, requiring at least one of the parties to set up a custom channel to access, process and ultimately make use of the transferred data. As companies seek to create robust solutions built upon dozens or even hundreds of data feeds, engineers face an exponentially growing problem of building and maintaining these custom communication channels. Simply put, data portability is a big hassle.

Crucially, data portability has become the cost of entry for these services. It is not enough for a social aggregator to claim the most sources or a social search company the biggest pool of data. The leaders in this space are focused on filtering and presenting data in useful ways; out of a billion pieces of data, they seek to connect you with the appropriate information at the appropriate time. All of the work building and maintaining back-end data portability services comes at the cost of building better front-end features that draw and satisfy users.

That’s where Gnip comes in. We’re dedicated to making data portability suck less, by reducing the effort required to collect and manage the data upon which these awesome new services are being created. Gnip aims to simplify the process of aggregating, standardizing and maintaining large pools of data, ultimately making he process as simple as uploading a list of your users.

Our first service is a solution to a key problem facing data portability implementations (Jud will give you the details in just a moment). We at Gnip believe in direct solutions to painful problems, and as a result, our first service isn’t fancy. But it’s quick to integrate, it scales like a monster and it uses a variety of web standards; we believe we’ve solved this particular problem pretty well. Over the coming months we’ll roll out additional direct solutions to painful problems, and before long we’ll have a bona fide platform for pushing data around the web.

We’re incredibly excited by the bounty that Web 2.0 has created. We are living with an embarrassment of riches in terms of shared information and experiences. But it’s overwhelming. I personally believe that Web 3.0 will herald a return to the individual — story, picture, friend, experience — because in aggregate, that which has great meaning often becomes meaningless. So it’s up to these awesome new services to take the Web 2.0 bounty and find for each of us those few things that will fundamentally enhance our lives. To give us something meaningful.

I hope that we at Gnip can build a foundation that enables these awesome new services to focus all of their attention on making great things. We’ll happily lay plumbing, mix concrete and smelt tin to see that happen.