Gnip; An Update

Gnip moved into our new office yesterday (other end of the block from our old office). The transition provided an opportunity for me to think about where we’ve been, and where we’re going.

Team

We continue to grow, primarily on the engineering side. Checkout our jobs page if you’re interested in working on a hard problem, with smart people, in a beautiful place (Boulder, CO).

Technology

We’ve built a serious chunk of back-end infrastructure that I’d break into two general pieces: “the bus”, and “the pollers.”

“The Bus”

Our back-end moves large volumes of relatively small (usually <~3k bytes) chunks of data from A to B in a hurry. Data is “published” into Gnip, we do some backflips with it, then spit it out the other side to consumers.

“The Pollers”

Our efforts to get Publishers to push directly into Gnip didn’t pan out the way we initially planned. As a result we had to change course and acquire data ourselves. The bummer here was that we set out on an altruistic mission to relieve the polling pain that the industry has been suffering from, but were met with such inertia that we didn’t get the coverage we wanted. The upside is that building polling infrastructure has allowed us to control more of our business destiny. We’ve gone through a few iterations on approach to polling. From complex job scheduling and systems that “learn” & “adapt” to their surroundings, to dirt simple, mindless grinders that ignorantly eat APIs/endpoints all day long. We’re currently slanting heavily toward simplicity in the model. The idea is to take learning’s from the simple model over time, and feed them into abstractions/re-factorings that make the system smarter.

Deployment

We’re still in the cloud. Amazon’s Ec2/S3 products have been a solid (albeit not necessarily the most cost effective when your CPU utilization isn’t in the 90%+ range per box), highly flexible, framework for us; hats off to those guys.

Industry

“The Polling Problem”

It’s been great to see the industry wake up and acknowledge “the polling problem” over the past year. SUP (Simple Update Protocol) popped up to provide more efficient polling for systems that couldn’t, or wouldn’t, move to an event-driven model. Providing a compact change-log for pollers, you can poll the change-log, and then go do heavier polls for only stuff that has changed. PubSubHubbub popped up to provide the framework for a distributed Gnip (though lacking inherent normalization). A combination of polling and events spread across nodes allows for a more decentralized approach.

“Normalization”

The Activity Streams initiative grew legs and is walking. As with any “standards” (or “standards-like”) initiative things are only as good as adoption. Building ideas in a silo without users makes for a fun exercise, but not much else. Uptake matters, and MySpace and Facebook (among many other smaller initiatives) have bitten off chunks of Activity Streams, and that’s a very big, good, sign for the industry. Structural, and semantic, consistency matters for applications digesting a lot of information. Gnip provides highly structured and consistent data to its consumers via gnip.xsd.

In order to meet its business needs, and to adapt to the constantly moving industry around it, Gnip has adjusted it’s approach on several fronts. We moved to incorporate polling. We understand that there is more than one way of doing and will incorporate SUP and PubSubHubbub into our framework. Doing so will make our own polling efforts more effective, and also provide data to our consumers with flexibility. While normalized data is nice for a large category of consumers, there is a large tier of customers that doesn’t need, or want, heavy normalization. Opaque message flow has significant value as well.

We set out to move mind-boggling amounts of information from A to B, and we’re doing that. Some of the nodes in the graph are shifting, but the model is sound. We’ve found there are primarily two types of data consumers: high-coverage of a small number of sources (“I need 100% of Joe, Jane, and Mike’s activity”), and “as high as you can get it”-coverage of a large number of sources (“I don’t need 100%, but I want very broad coverage”). Gnip’s adjusted to accommodate both.

Business

We’ve had to shift our resources to better focus on the paying segments of our audience. We initially thought “life-stream aggregators” would be our biggest paying customer segment, however data/media analytics firms have proven significant. Catering to the customers who tell you “we have budget for that!” makes good business sense, and we’re attacking those opportunities.

Gravitational Shift

Gnip’s approach to getting more Publishers into the system has evolved. Over the past year we’ve learned a lot about the data delivery business and the state of its technological art. While our core infrastructure remains a highly performant data delivery bus, the way data arrives at Gnip’s front door is shifting.

We set out assuming the industry, at large (both Publishers and Consumers), was tired of highly latent data access. What we’ve learned is that data Consumers (e.g. life-stream aggregators) are indeed weary of the latency, but that many Publishers aren’t as interested in distributing their data in real-time as we initially estimated. So, in order to meet intense Consumer demand to have data delivered in a normalized, minimal latency (not necessarily “real-time”), manner, Gnip is adding many new polled Publishers to its offering.

Checkout http://api.gnip.com and see how many Publishers we have to offer as a result of walking down the polling path.

Our goal remains to “deliver the web’s data,” and while the core Gnip delivery model remains the same, polling has allowed us to greatly expand the list of available Publishers in the system.

Tell us which Publishers/data sources you want Gnip to deliver for you! http://gnip.uservoice.com/

We have a long way to go, but we’re stoked at the rate we’re able to widen our Publisher offering now that our polling infrastructure is coming online.

New Gnip.com is live

We pushed out our new corporate website just now along with a whole new brand for the company.  The new site is located at http://www.gnip.com.

We are putting the www.gnipcentral.com domain to rest and all of these URLs will begin to redirect to the correct gnip.com address as the DNS entries propogate.

While a new website and domain is a small thing in our overall plans we do hope that people find the new site more informational when it comes to the business value the Gnip platform provides and the great companies and customers we are working with every day.

Thanks to everyone in the community for helping us reach this milestone, which coincides with our 1 year anniversary this week.  The last year was great, and we still have a lot of $h*t to pop!

Gnip Beta 2 Launching Today, New www.gnip.com coming

Several new updates to Gnip websites are being released this week that will impact companies and people using the Gnip developer site and the Gnip corporate website.

1) We just started a maintenance on the demo system with a post on the forum and to the @gnipsupport Twitter account.

  • The reason for this maintenance period is that we are moving to the next stage of this release and entering Beta 2.  Once we come back online the Beta developer site that is currently hosted at demo.gnip.com will be available at a new location:  http://api.gnip.com.
  • We are migrating all the account information to the new site so if you have an existing account on the prior demo site it will be waiting for you at http://api.gnip.com.
  • If your integration to the demo Gnip API is being done using our convenience libraries then you will need to manually update your hostname to https://api-v21.gnip.com .
  • Documentation is available for the new Gnip version 2.1 API and schema at http://gnip.com/docs
  • The new URL, http://api.gnip.com, will be the final destination and future home for the version 2.1 version of the Gnip platform after we conclude Beta 2.  We are making this update now so that the integrations completed to api.gnip.com will contiue to work once we flip the switch to making this generally available sometime in the next few weeks.
  • For people using the current version of the Gnip platform (v2.0) we are continuing to provide support and will do so through the Beta 2 period and 30 days after the general release of version 2.1.

2) We are also doing an update to our corporate website.  Today the website is located at www.gnipcentral.com, and because we were able to acquire the gnip.com domain we are moving!  Everyone going to any gnipcentral.com based URL will be redirected to the apporpriate gnip.com URL.   So, look for the new website before the end of the week.

Here is a glimpse of the new home page for the curious…..

gnip.com

More Examples of How Companies are Using Gnip

We have noticed that we are interacting with two distinct groups of companies; those who instantly understand what Gnip does and those that struggle with what we do, so we decided to provide a few detailed real-world examples of the companies we are actively working with to provide data integration and messaging services today.

First, we are not an end-user facing social aggregation application. (We repeat this often.) We see a lot of people wanting to put Gnip in that bucket along with social content aggregators like FriendFeed, Plaxo and many others. These content aggregators are destination web sites that provide utility to end users by giving them flexibility to bring their social graph or part of their graph together in one place. Also, many of these services are now providing web APIs that allow people to use an alternative client to interact with their core services around status updates and conversations as well other features specific to the service.

Gnip is an infrastructure service and specifically we provide an extensible messaging system that allows companies to more easily access, filter and integrate data from web based APIs. While someone could use Gnip as a way to bring content into a personal social media client they want to write for a specific social aggregator it is not something we are focused. Below are the company use cases we are focused:

  1. Social content aggregators: One of the main reasons we started Gnip was to solve the problems being caused by the point-to-point integration issues that were springing up with the increase of user generated content and corresponding open web APIs. We believe that any developer who has written a poller once, twice, or to their nth API will tell you how unproductive it is to write and maintain this code. However, writing one-off pollers has become a necessary evil for many companies since the content aggregators need to provide access to as many external services as possible for their end users. Plaxo, who recently integrated to Gnip as a way to support their Plaxo Pulse feature is a perfect example, as are several other companies.
  2. Business specific applications: Another main reason we started Gnip was that we believe more and more companies are seeing the value of integrating business and social data as a way to add additional compelling value to their own applications. There are a very wide set of examples, such as how Eventvue uses Gnip as a way to integrate Twitter streams into their online conference community solution, and the companies we have talked to about how they can use Gnip to integrate web-based data to power everything from sales dashboards to customer service portals.
  3. Content producers: Today, Gnip offers value to content producers by providing developers an alternative tool that can be used to integrate to their web APIs. We are working with many producers, such as Digg, Delicious, Identi.ca, and Twitter, and plan to continue to grow the producers available aggressively. The benefits that producers see from working with Gnip include off-loading direct traffic to their web apis as well as providing another channel to make their content available. We are also working very hard to add new capabilities for producers, which includes plans to provide more detailed analytics on how their data is consumed and evaluating publishing features that could allow producers to define their own filters and target service endpoints and web sites where they want to push relevant data for their own business needs.
  4. Market and brand research companies: We are working with several companies that provide market research and brand analysis. These companies see Gnip as an easy way to aggregate social media data to be included in their brand and market analysis client services.

Hopefully this set of company profiles helps provide more context on the areas we are focused and the typical companies we are working with everyday. If your company does something that does not fit in these four areas and is using our services please send me a note.

What We Are Up to At Gnip

As the newest member of the Gnip team I have noticed that people are asking a lot of the same questions about what we are doing at Gnip and what are the ways people can use our services in their business.

What we do

Gnip provides an extensible messaging platform that allows for the publishing or subscribing of events and data from across the Internet, which makes data portability exponentially less painful and more automatic once it is set up. Because Gnip is being built as a platform of capabilities and not a web application the core services are instantly useful for multiple scenarios, including data producers, data consumers and any custom web applications. Gnip already is being used with many of the most popular Internet data sources, including Twitter, Delicious, Flickr, Digg, and Plaxo.

How to use Gnip

So, who is the target user of Gnip? It is a developer, as the platform is not a consumer-oriented web application, but a set of services meant to be used by a developer or an IT department for a set of core use cases.

  • Data Consumers: You’ve built your pollers, let us tell you when and where to fire them. Avoid throttling and decrease latency from hours to seconds.
  • Data Producers: Push your data to us and reduce API traffic by an order of magnitude while increasing distribution through aggregators.
  • Custom web applications: You want to embed or publish content to be used in your own application or for a third-party application. Decide who, or what, you care about for any Publisher, give us an end-point, and we push the data to you so you can solve your business use cases, such as customer service websites, corporate websites, blogs, or any web application.

Get started now

By leveraging the Gnip APIs, developers can easily design reusable services, such as, push-based notifications, smart filters and data streams that can be used for all your web applications to make them better. Are you a developer? Give the new 2.0 version a try!

We're Taking Part in the Boulder Job Fair — Would You Like a Free Trip to Check Out Boulder (and Gnip)?

Boulder has the highest per capita programmers in the country.  It also has the healthiest people on the planet (I can’t back this one up with stats, just anecdotal evidence of 60-year-old grandmothers zooming up and down the mountains).  Basically, Boulder is the land of the mathlete, and it’s awesome!

A ton of local startups are competing for the best developers in Boulder and we’ve come to a common conclusion — we need to expand the pool of applicants.  It’s time to give developers living in the Bay Area, Boston and Bentonville a taste of the Boulder lifestyle and simultaneously introduce them to some of the coolest companies Boulder has to offer.

Are you a badass developer?  Do you code PHP or Jave or C++ in you sleep?  Can you denormalize a database with your eyes closed or create elegant streams of CSS?  We’d like to meet you.  In fact, we’d like to fly you out all expense paid to Boulder for a couple of days to meet some awesome companies, including Gnip, to see if there’s a love connection.  You’ll fly out on day one and spend time checking out the town, spend day two meeting with 20 killer tech companies and then have a third day to follow up with companies you like the best and then fly home.  Not a bad way to spend the last week of October.

If you’d like to know more, check out the additional details at Boulder.Me and then click the button to apply.

We’re looking forward to meeting you in Boulder next month.  We think you’ll dig the town as much as we do, and the companies are pretty rad, too.

Three (Six?) Week Software Retrospective

I had to go back into older blog posts to remind myself when we launched; July 1st. It feels like we’ve been live since June 1st.

Looking Back

Things have gone incredibly well from an infrastructure standpoint. We’ve had to add/adjust some system monitoring parameters to accommodate the variety of Data Producers publishing into the system; different frequencies/volumes call for for specialized treatment. We weren’t expecting the rate, or volume, of Collection creation we wound up with. Within three hours of going live, we had enough Collections in the system to adversely impact node startup/sync times. We patiently tuned our data model, and tuned TerraCotta locks to get things back to normal. It’s looking like we’ll be in bed with TerraCotta for the long haul.

Amazon

I’m not sure I could be any more pleased with AWS. Our core service is heavily dependent on EC2, and that’s been running sans issues. We’re working on non-Amazon failover solutions that assure un-interrupted service even if all of EC2 dies. Our backups are S3 dependent so we had some behind the scenes issues last weekend when S3 was flaky; see my previous post on this issue. We haven’t had our day in the sun with outages, and I obviously hope we never do, but so far I’m walking around with a big “I <3 AWS” t-shirt on.

Other

On the convenience library front, we (Gnip + community) have made all of our code available on github. We’ve had tremendous community support and contribution on this front; so cool to see; thanks everyone!

Collections are by far the primary data access pattern (as opposed to raw public activity stream polling); not really a surprise.

Summize/Twitter has been a totally cool way to track ether discussion around Gnip. When we notice folks talking about Gnip, positive or negative, we can reach out in “real-time” and strike up a conversation.

That’s all for now.

Thanks to all the Data Producers and Consumers that have integrated with Gnip thus far!