Garbage In, Garbage Out

Gnip is an intermediary service for message flow across disparate network endpoints. Standing in the middle allows for a variety of value adds (Data Producers can “publish once, distribute to many,” Data Consumers can enjoy single service interaction rather than one-off’ing over and over again), but the quality of data that Data Producers push into the system is fundamental.

Only As Good As The Sum Of Our Parts

Gnip doesn’t control the quality of the data being published to it. Whether it comes in the form of XMPP messages, RSS, or ATOM, there are many issues that can come into play that can affect the data a Data Consumer receives.

  • Bad transport/delivery – The source XMPP, RSS, ATOM, or REST, feed can go down. When this happens for a given Publisher, that source has vanished and Gnip doesn’t receive messages for that Publisher. We’re only as good as the data coming in. While Gnip can consume data from XMPP, RSS, ATOM, and other sources, our preferred inbound message delivery method is via our REST API. Firing off messages to Gnip directly, and not through yet another layer, minimizes delivery issues.
  • Bad data – As any aggregator (Friend Feed, Social Thing, MoveableType Activity Streams…) can attest, the data coming across XMPP, RSS, and ATOM feeds today is a mess. From bad/illegal formatting, to bad/illegal data escaping, nearly every activity feed has unique issues that have to be handled on a case by case basis. There will be bugs. We will fix them as they arise. Once again, these issues can be minimized if Data Producers deliver messages directly to Gnip via our REST API.
  • Bad policy – This one’s interesting. Gnip makes certain assumptions about the kind of data it receives. In our current implementation we advertise to Data Consumers that Data Producers push all public, per user, change notifications generated within their systems, to Gnip. This usually corresponds to the existing public API policies for said Data Producers. We will eventually offer finely tuned, Data Producer controlled, data policies, but for today’s public facing Gnip service, we do not want to see Data Producers creating publishing policies specific to Gnip. Doing so confuses the middle-ware dynamic we’re trying to create with our current product, and subsequently muddies the water for everyone. Imagine a Data Consumer interacting with a Data Producer directly under one policy, then interacting with Gnip under another policy; confusing. Again, we will, perhaps earlier than we think, cater to unique data policies on a per Data Producer basis, but, we’re not there yet.

While addressing all of these issues is part of our vision, they’re not all resolved out of the gate.