Archive for December, 2008

Posted in APIs,Uncategorized by Eric No Comments

Storytlr

Who is Storytlr?
Storytlr provides a life streaming service that allows people to bring together their entire web 2.0 life and assemble their content to tell stories in a whole new way.  Learn more at their website, http://storytlr.com/, or their blog, http://blog.storytlr.com/.

Real-world results Storytlr says they are realizing from using Gnip
Storytlr is using Gnip to provide real-time data integration to Twitter, Digg, Delicious and Seesmic.  Since Storytrl starting using Gnip they have seen a reduction in the latency for the data integration of these social media activity streams (i.e. the time elapsed for a tweet, digg, or event notice to show up in the Storytlr service from a third-party is now real-time). Read more on how Storytlr added real-time integration using Gnip in their recent blog post.

We are looking forward to working more with the Storytlr team as we roll out more publishers that they can take advantage of in their business. 

Posted in Uncategorized by Eric No Comments

Seesmic is the latest publisher to be added to the Gnip Platform.

The initial integration between Gnip and Seesmic allows people to easily filter and integrate the Seesmic firehose or specific user activities into third-party applications and websites using either Gnip Notifications or Gnip Data Streams with the “Actor” rule type. Seesmic pushes to Gnip using an XMPP implementation and we are excited to work with them as they continue to evolve their service.   The Seesmic publisher just went live last week and we already have people integrating real-time notifications and data streams via Gnip, so go grab some Seesmic to join or create your own video conversation!

Learn more about Seesmic from their blog or their website.

Posted in Uncategorized by Jud 1 Comment

We’ve been busy over the past several months working hard on what we consider a fundamental piece of infrastructure that the network has been lacking for quite some time. From “ping server for APIs” to “message bus”, we’ve been called a lot of things; and we are actually all of them rolled into one. I want to provide some insight into what our backend architecture looks like as systems like this generally don’t get a lot of fanfare, they just have to “work.” Another title for this blog post could have been “The Glamorous Life of a Plumbing Company.”

First, some production numbers.

  • 99.9%: the Gnip service has 99.9% up-time.
  • 0: we have had zero Amazon Ec2 instances fail.
  • 10: ten Ec2 instances, of various sizes, run the core, redundant, message bus infrastructure.
  • 2.5m: 2.5 million unique activities are HTTP POSTed (pushed) into Gnip’s Publisher front door each day.
  • 2.8m: 2.8 million activities are HTTP POSTed (pushed) out Gnip’s Consumer back door each day.
  • 2.4m: 2.4 million activities are HTTP GETed (polled) from Gnip’s Consumer back door each day.
  • $0: no money has been spent on framework licenses (unless you include “AWS”).

Second, our approach.

Simplicity wins. These production transaction rate numbers, while solid, are not earth shattering. We have however, achieved much higher rates in load tests. We optimized for activity retrieval (outbound) as opposed to delivery into Gnip (inbound). That means every outbound POST/GET, is moving static data off of disk; no math gets done. Every inbound activity results in processing to ensure proper Filtration and distribution; we do the “hard” work on delivery.

We view our core system as handling ephemeral data. This has allowed us, thus far, to avoid having a database in the environment. That means we don’t have to deal with traditional database bottlenecks. To be sure, we have other challenges as a result, but we decided to take on those as opposed to have the “database maintenance and administration” ball and chain perpetually attached. So, in order to share contentious state across multiple VMs, across multiple machine instances, we use shared memory in the form of TerraCotta. I’d say TerraCotta is “easy” for “simple” apps, but challenges emerge when you start dealing with very large data sets in memory (multiple giga-bytes). We’re investing real energy in tuning our object graph, access patterns, and object types to keep things working as Gnip usage increases. For example, we’re in the midst of experimenting with pageable TerraCotta structures that ensure smaller chunks of memory can be paged into “cold” nodes.

When I look at the architecture we started with, compared to where we are now, there are no radical changes. We chose to start clustered, so we could easily add capacity later, and that has worked really well. We’ve had to tune things along the way (split various processes to their own nodes when CPU contention got too high, adjust object graphs to optimize for shared memory models, adjust HTTP timeout settings, and the like), but our core has held strong.

Our Stack

  • nginx – HTTP server, load balancing
  • JRE 1.6 – Core logic, REST Interface
  • TerraCotta – shared memory for clustering/redundancy
  • ejabberd – inbound XMPP server
  • Ruby – data importing, cluster management
  • Python – data importing

High-Level Core Diagram

Gnip Core Architecture Diagram

Gnip owes all of this to our team & our customers; thanks!

Posted in Long,Publishers,Release,solutions by Eric No Comments

We just pushed out a new release this week that includes new publishers and capabilities. Here is a summary of the release highlights. Enjoy!

  • New YouTube publisher: Do you need an easy way to access, filter and integrate YouTube content to your web application or website? Gnip now provides a YouTube publisher so go create some new filters and start integrating YouTube based content.
  • New Flickr publisher: Our first Flickr publisher had some issues with data consistency and could almost be described as broken. We built a brand new Flickr publisher to provide better access to content from Flickr. Creating filters is a snap so go grab some Flickr content.
  • Now publisher information can be shared across accounts: When multiple developers are using Gnip to integrate web APIs and feeds it sometimes is useful to see other filters as examples. Sharing allows a user to see publisher activity and statistics, but does grant the ability to edit or delete.
  • New Data Producer Analytics Dashboard: If your company is pushing content through Gnip we understand it is important to see how, where and who is accessing the content using our platform and with this release we have added a web-based data producer analytics dashboard. This is a beta feature, not where we want it yet, and we have some incomplete data issues. However, we wanted to get something available and then iterate based on feedback. If you are a data producer let us know how to take this forward. The current version provides access to the complete list of filters created against a publisher and the information can be downloaded in XML or CSV format

Also, we have a few things we are working on for upcoming releases:

  • Gnip Polling: Our new Flickr and YouTube publishers both leverage our new Gnip Polling service, which we have started using internally for access to content that is not available via our push infrastructure. We plan to make this feature available externally to customers in the future, so stay tuned or contact us if you want to learn more.
  • User generated publishers from RSS Feeds: We are going to open up the system so anyone can create new publishers from RSS Feeds. This new feature makes it easy to access, filter and integrate tons of web based content.
  • Field level mapping on RSS feeds: A lot of times the field naming of RSS feeds across different endpoints does not map to the way the field is named in your company. This new feature will allow the editing and mapping at the individual field level to support normalization across multiple feeds.
  • Filter rule batch updates: When your filters start to get big adding lots of new rules can be a challenge. Based on direct customer feedback it will soon be possible to batch upload filter rules.

Posted in Uncategorized by Eric No Comments

Are you using Gnip and love getting real-time access to data from various web APIs across the Internet? Show us some love and a vote or two in the TechCrunch 2008 Crunchies

Best New Startup of 2008

and

Best Technology Innovation/Achievement

Thanks and look for information on our next release in a few days.