Simplicity Wins

It seems like every once in a while we all have to re-learn certain lessons.

As part of our daily processing, Gnip stores many terabytes of data in millions of keys on Amazon’s S3. Various aspects of serving our customers require that we pour over those keys and the data behind them, regularly.

As an example, every 24 hours we construct usage reports that provide visibility into how our customers are using our service. Are they consuming a lot or a little volume? Did their usage profile change? Are they not using us at all? So on and so on. We also have what we affectionately refer to as the “dude where’s my tweet” challenge; of the billion activities we deliver each day to our customers, inevitably someone says “hey, I didn’t receive Tweet ‘X’ what gives?” Answering that question requires that we store the ID of every Tweet a customer ever receives. Pouring over all this data every 24 hours is a challenge.

As we started on the project, it seemed like a good fit for Hadoop. It involves pulling in lots of small-ish files, doing some slicing, aggregate the results, and spitting them out the other end. Because we’re hosted in Amazon it was natural to use their Elastic MapReduce service (EMR).

Conceptually the code was straight forward and easy to understand. The logic fit the MapReduce programming model well. It requires a lot of text processing and sorts well into various stages and buckets. It was up and running quickly.

As the size of the input grew it started to have various problems, many of which came down to configuration. Hadoop options, JVM options, open file limits, number and size of instances, number of reducers, etc. We went through various rounds of tweaking settings and throwing more machines in the cluster, and it would run well for a while longer.

But it still occasionally had problems. Plus there was that nagging feeling that it just shouldn’t take this much processing power to do the work. Operational costs started to pop up on the radar.

So we did a small test to check the feasibility of getting all the necessary files from S3 onto a single EC2 instance and processing it with standard old *nix tools. After promising results we decided to pull it out of EMR. It took several days to re-write, but we’ve now got a simple Ruby script using various *nix goodies like cut, sort, grep and their friends. The script is parallel-ized via JRuby threads at various points that make sense (downloading multiple files at once and processing the files independently once they’ve been bucketed).

In the end it runs in less time than it did on EMR, on a single modest instance, is much simpler to debug and maintain, and costs far less money to run.

We landed in a somewhat counter-intuitive place. There’s great technology available these days to process large amounts of data; we continue to use Hadoop for other projects. But as we start to bring them into our tool-set we have to be careful not to forget the power of straight forward, traditional tools.

Simplicity wins.

Customer Spotlight – MutualMind

 
Like many startups seeking to enter and capitalize on the rising social media marketplace, timing is everything. MutualMind was no exception: getting their enterprise social media management product to market in a timely manner was crucial to the success of their business. MutualMind provides an enterprise social media intelligence and management system that monitors, analyzes, and promotes brands on social networks and helps increase social media ROI. The platform enables customers to listen to discussion on the social web, gauge sentiment, track competitors, identify and engage with influencers, and use resulting insights to improve their overall brand strategy.

“Through their social media API, Gnip helped us push our product to market six months ahead of schedule, enabling us to capitalize on the social media intelligence space. This allowed MutualMind to focus on the core value it adds by providing advanced analytics, seamless engagement, and enterprise-grade social management capabilities.”

- Babar Bhatti
CEO, MutualMind

By selecting Gnip as their data delivery partner, MutualMind was able to get their product to market six months ahead of schedule. Today, MutualMind processes tens of millions of data activities per month using multiple sources from Gnip including premium Twitter data, YouTube, Flickr, and more.
 
Get the full detail, read the success story here.

Letter From The New Guy

Not too long ago Gnip celebrated its third birthday.  I am celebrating my one week anniversary with the company today.  To say a lot happened before my time at Gnip would be the ultimate understatement, and yet it is easy for me to see the results produced from those three years of effort.  Some of those results include:

The Product

Gnip’s social media API offering is the clear leader in the industry.  Gnip is delivering over a half a billion social media activities daily from dozens of sources.  That certainly sounds impressive, but how can I be so confident Gnip is the leader?  Because the most important social media monitoring companies rely on our services to deliver results to their customers every single day. For example, Gnip currently works with 8 of the top 9 enterprise social media monitoring companies, and the rate we are adding enterprise focused companies is accelerating.

The Partners

Another obvious result is the strong partnerships that have been cultivated.  Some of our partnerships such as Twitter and Klout were well publicized when the agreements were put in place.  However, having strong strategic partners takes a lot more than just a signed agreement.  It takes a lot of dedication, investment, and hard work by both parties in order to deliver on the full promise of the agreement.  It is obvious to me that Gnip has amazing partnerships that run deep and are built upon a foundation of mutual trust and respect.

The People

The talent level at Gnip is mind blowing, but it isn’t the skills of the people that have stood out the most for me so far.  It is the dedication of each individual to doing the right thing for our customers and our partners that has made the biggest impression.  When it comes to gathering and delivering social media data, there are a lot of shortcuts that can be taken in order to save time, money, and effort.  Unfortunately, these shortcuts can often come at the expense of publishers, customers, or both.  The team at Gnip has no interest in shortcuts and that comes across in every individual discussion and in every meeting.  If I were going to describe this value in one word, the word would be “integrity”.

In my new role as President & COO, I’m responsible for helping the company grow quickly and smoothly while maintaining the great values that have been established from the company’s inception.  The growth has already started and I couldn’t be more pleased with the talent of the people who have recently joined the organization including: Bill Adkins, Seth McGuire, Charles Ince, and Brad Bokal who have all joined Gnip within the last week.  And, we are hiring more! In fact, it is worth highlighting one particular open position for a Customer Support Engineer.  I’m hard pressed to think of a higher impact role at our company because we consider supporting our customers to be such an important priority.  If you have 2+ years of coding experience including working with RESTful Web APIs and you love delivering over-the-top customer service, Gnip offers a rare opportunity to work in an environment where your skills will be truly appreciated.  Apply today!

I look forward to helping Gnip grow on top of a strong foundation of product, partners, and people.  If you have any questions, I can be reached at chris [at] gnip.com.