Backfill: Eliminating Data Loss from Brief Disconnects

Complete coverage of social data is one of the key principles behind Gnip’s products. Our customers’ enterprise applications depend on this and trust Gnip to provide access to every activity that matters. Delivery of streaming data, however, has its challenges. In particular, anything that results in a disconnect causes the flow of data to stop.

In light of this, we are excited to announce Backfill, a new product feature for all sources with full firehose access. Backfill simplifies and automates the process of collecting data that would otherwise be missed during brief disconnects.

Streaming connections can be disrupted for a variety of reasons, some of which are unavoidable. For example, a client application may need to be redeployed with updated code, resulting in a brief window where there is no connection. Changes to a network setting can also inadvertently timeout the connection even though data is still being delivered. And even if there are no problems with the client or server, long-standing HTTP connections are always at risk of disconnect due to general internet instability.

With Backfill, you can seamlessly receive data that would have been otherwise missed during any of these disconnect scenarios. We designed Backfill for ease of use. For any disconnect less than 5 minutes in length, data is sent through the main connection automatically upon reconnect. Once the buffered data has been consumed, the stream continues to deliver realtime data. By sending the data through the same connection, you don’t need to manage a secondary process or merge in data. There’s no need to identify the exact timeframe of the disconnect or the filtering rules that were in place at the time and implementation only requires a small change to the connection URL.

Backfill complements our other reliability features, giving you the flexibility to implement a system that meets your needs. Here is how you can think about Backfill compared to some of our other reliability features:

  • Backfill: Automatically receive all data that would have been missed during a brief disconnect. Best for eliminating data gaps from brief disconnects.

  • Replay: Recover larger blocks of data missed in your realtime stream within the last few days. Time periods for recovery can be customized. Best for recovery of data missed due to extended disconnections or outages.

  • Redundant connections: Consuming data from multiple connections can prevent the need for data recovery in some situations. Best as a proactive measure to prevent data loss.

Getting started with Backfill is easy. Send a note to info@gnip.com or reach out to your Gnip account manager to learn more.