Social Data in a Marketplace

Gnip; shipping & handling for data. Since our inception a couple of years ago, this is one of the ways we’ve described ourselves. What many folks in the social data space (publishers and consumers alike) surprisingly don’t understand however is that such a thing is necessary. Several times we’ve come up against folks who indicate that either a) “our (random publisher X) data’s already freely available through an API” or b) “I (random consumer Y) have free access to their data through their API.” While both statements are often true, they’re shortsighted.

If you’re a “web engineer” versed in HTTP and XHR with time on your hands, then accessing data from a social media publisher (e.g. Twitter, Facebook, MySpace, Digg…. etc) may be relatively straightforward. However, while API integration might be “easy” for you, keep in mind that you’re in the minority. Thousands of companies, either not financially able to afford a “web engineer” or simply technically focused elsewhere (if at all), need help accessing the data they need to make business decisions. Furthermore, while you may do your own integrations, how robust is your error reporting, monitoring, and management of your overall strategy? Odds are that you have not given those areas the attention they require. Did your stream of data stop because of a bug in your code, or because the service you were integrated with went down? Could you more efficiently receive the same data from a publisher, while relieving load from your (and the publisher’s) system? Do you have live charts that depict how data is moving through the system (not just the publisher’s side of the house)? This is where Gnip Data Collection as a Service steps in.

As the social media/data space has evolved over the past couple of years, the necessity of a managed/solution-as-a-service has become clear. As expected, the number of data consumers continues to explode, while the number of consumers with technical capability to reliably integrate with the publishers, as a ratio to total, is shrinking.

Finally some good technical/formatting standards are catching on (PubSubHubbub, WebHooks, HTTP-long-polling/streaming/Comet (thanks Twitter), ActivityStreams), which is giving everyone a vocabulary and common conceptual understanding to use when discussing how/when real-time data is produced/consumed.

In 2010 we’re going to see the beginnings of maturation in the otherwise Wild-West of social data. As things evolve I hope innovation doesn’t suffer (mass availability of data has done wonderful things), but I do look forward to giving other, less inclined, players in the marketplace access to the data they need. As a highly focused example of this kind of maturation happening before our eyes, checkout SimpleGeo. Can I do geo stuff as an engineer, yes. Do I want to collect the thousand sources of light to build what I want to build around/with geo; no. I prefer a one-stop-shop.