Simplicity Wins

It seems like every once in a while we all have to re-learn certain lessons.

As part of our daily processing, Gnip stores many terabytes of data in millions of keys on Amazon’s S3. Various aspects of serving our customers require that we pour over those keys and the data behind them, regularly.

As an example, every 24 hours we construct usage reports that provide visibility into how our customers are using our service. Are they consuming a lot or a little volume? Did their usage profile change? Are they not using us at all? So on and so on. We also have what we affectionately refer to as the “dude where’s my tweet” challenge; of the billion activities we deliver each day to our customers, inevitably someone says “hey, I didn’t receive Tweet ‘X’ what gives?” Answering that question requires that we store the ID of every Tweet a customer ever receives. Pouring over all this data every 24 hours is a challenge.

As we started on the project, it seemed like a good fit for Hadoop. It involves pulling in lots of small-ish files, doing some slicing, aggregate the results, and spitting them out the other end. Because we’re hosted in Amazon it was natural to use their Elastic MapReduce service (EMR).

Conceptually the code was straight forward and easy to understand. The logic fit the MapReduce programming model well. It requires a lot of text processing and sorts well into various stages and buckets. It was up and running quickly.

As the size of the input grew it started to have various problems, many of which came down to configuration. Hadoop options, JVM options, open file limits, number and size of instances, number of reducers, etc. We went through various rounds of tweaking settings and throwing more machines in the cluster, and it would run well for a while longer.

But it still occasionally had problems. Plus there was that nagging feeling that it just shouldn’t take this much processing power to do the work. Operational costs started to pop up on the radar.

So we did a small test to check the feasibility of getting all the necessary files from S3 onto a single EC2 instance and processing it with standard old *nix tools. After promising results we decided to pull it out of EMR. It took several days to re-write, but we’ve now got a simple Ruby script using various *nix goodies like cut, sort, grep and their friends. The script is parallel-ized via JRuby threads at various points that make sense (downloading multiple files at once and processing the files independently once they’ve been bucketed).

In the end it runs in less time than it did on EMR, on a single modest instance, is much simpler to debug and maintain, and costs far less money to run.

We landed in a somewhat counter-intuitive place. There’s great technology available these days to process large amounts of data; we continue to use Hadoop for other projects. But as we start to bring them into our tool-set we have to be careful not to forget the power of straight forward, traditional tools.

Simplicity wins.

Customer Spotlight – MutualMind

 
Like many startups seeking to enter and capitalize on the rising social media marketplace, timing is everything. MutualMind was no exception: getting their enterprise social media management product to market in a timely manner was crucial to the success of their business. MutualMind provides an enterprise social media intelligence and management system that monitors, analyzes, and promotes brands on social networks and helps increase social media ROI. The platform enables customers to listen to discussion on the social web, gauge sentiment, track competitors, identify and engage with influencers, and use resulting insights to improve their overall brand strategy.

“Through their social media API, Gnip helped us push our product to market six months ahead of schedule, enabling us to capitalize on the social media intelligence space. This allowed MutualMind to focus on the core value it adds by providing advanced analytics, seamless engagement, and enterprise-grade social management capabilities.”

- Babar Bhatti
CEO, MutualMind

By selecting Gnip as their data delivery partner, MutualMind was able to get their product to market six months ahead of schedule. Today, MutualMind processes tens of millions of data activities per month using multiple sources from Gnip including premium Twitter data, YouTube, Flickr, and more.
 
Get the full detail, read the success story here.

Handling High-Volume, Realtime, Big Social Data

The social ecosystem has become the pulse of the world. From delivering breaking news like the death of Osama Bin Laden before it hit mainstream media to helping President Obama host the first Twitter Town Hall, the realtime social web is flooded with valuable information just waiting to be analyzed and acted upon. With millions of users and billions of social activities passing through the ever-growing realtime social web each day, it is no wonder that companies need to reevaluate their traditional business models to take advantage of this big social data.But with the exponentially ever-growing social web, massive amounts of data are pouring into and out of social media publishers’ websites and APIs every second. In a talk I gave at GlueCon a couple of months ago, I ran down some math to put things into perspective. The numbers are a little dated, but the impact is the same. At that time there were approximately 155,000,000 Tweets per day and the average size of a Tweet was approximately 2,500 Bytes (keep in mind this could include Retweets).

A Little Bit of Arithmetic

155,000,000 Tweets/day   2,500 Bytes = 387,500,000,000 Bytes/day

387,500,000,000 Bytes/day  24 Hours = 16,145,833,333 Bytes/hour

16,145,833,333 Bytes/hour 60 minutes = 269,097,222 Bytes/minute

269,097,222 Bytes/minute 60 second = 4,484,953 Bytes/second

4,484,953 Bytes/second  1,048,576 Bytes/megabyte = 4.2 Megabytes/second

And in terms of data transfer rates . . .

1 Megabyte/second = 8 Megabits/second

So . . .

4.2 Megabytes/second  8 Megabits/Megabyte = 33.8 Megabits/second

That’s a Lot of Data

So what does this mean for the data consumers, the companies wanting to reevaluate their traditional business models to take advantage of vast amounts of Twitter data? At Gnip we’ve learned that some of the collective industry data processing tools simply don’t work at this scale: out-of-the-box HTTP servers/configs aren’t sufficient to move the data, out-of-the-box config’d TCP stacks can’t deliver this much data, and consumption via typical synchronous GET request handling isn’t applicable. So we’ve built our own proprietary data handling mechanisms to capture and process mass amounts of realtime social data for our clients.

Twitter is just one example. We’re seeing more activity on today’s popular social media platforms and a simultaneous increase in the number of popular social media platforms. We’re dedicated to seamless social data delivery to our enterprise customer base and we’re looking forward to the next data processing challenge.

Get your Hack On! Gnip Helps Power an App Developed at the 2011 TechCrunch Disrupt Hackathon

Over 500 individuals recently gathered in New York City for this year’s TechCrunch Disrupt Hackathon. This annual event, fueled by pizza, beer, and Red Bull, features teams of die-hard techies that spend 20 hours, many without sleep (hence the Red Bull), developing and coding the next big idea. Participants compete in a lightning round of pitches in front of a panel of judges with the winners receiving an opportunity to pitch on the main stage at the TechCrunch Disrupt Conference in front of more than 1,000 venture capitalists and industry insiders.

We are excited that one of the apps that was developed at the 2011 Hackathon was powered by Gnip data! We love it when our customers find new and creative ways to use the data we provide.

Edward Kim (@edwkim) and Eric Lubow (@elubow) from SimpleReach (@SimpleReach), which provides next generation social advertising for brands, put a team together to develop LinkCurrent, an app powered by Gnip data and designed to measure the current and future social value of a specific URL. When fully developed, the LinkCurrent app will provide the user with a realtime dashboard illustrating various measures of a URL’s worth — featuring an overall social score, statistics on the Klout Scores of people who have Tweeted the URL, how many times the URL has been Liked on Facebook and posted on Twitter, and geo-location information to provide insight into the content’s reach. Call it influence-scoring for web content.

The hackathon team also included Russ Bradberry (@devdazed) and Carlos Zendejas (@CLZen), also of SimpleReach, Jeff Boulet (@properslang) of EastMedia/Boxcar (@eastmedia/@boxcar), Ryan Witt (@onecreativenerd) of Opani (@TheOpanis), and Michael Nutt (@michaeln3) of Movable Ink (@movableink)– Congratulations to everyone who participated! You created an amazing app in less than 20 hours and developed a creative new use for Gnip data. I highly encourage all of you to check it out: www.linkcurrent.co

Have fun and creative way you have used data delivered by Gnip? We would love to hear about it and you could be featured in our next blog. Drop us an email or give us a call at 888.777.7405.

Letter From The New Guy

Not too long ago Gnip celebrated its third birthday.  I am celebrating my one week anniversary with the company today.  To say a lot happened before my time at Gnip would be the ultimate understatement, and yet it is easy for me to see the results produced from those three years of effort.  Some of those results include:

The Product

Gnip’s social media API offering is the clear leader in the industry.  Gnip is delivering over a half a billion social media activities daily from dozens of sources.  That certainly sounds impressive, but how can I be so confident Gnip is the leader?  Because the most important social media monitoring companies rely on our services to deliver results to their customers every single day. For example, Gnip currently works with 8 of the top 9 enterprise social media monitoring companies, and the rate we are adding enterprise focused companies is accelerating.

The Partners

Another obvious result is the strong partnerships that have been cultivated.  Some of our partnerships such as Twitter and Klout were well publicized when the agreements were put in place.  However, having strong strategic partners takes a lot more than just a signed agreement.  It takes a lot of dedication, investment, and hard work by both parties in order to deliver on the full promise of the agreement.  It is obvious to me that Gnip has amazing partnerships that run deep and are built upon a foundation of mutual trust and respect.

The People

The talent level at Gnip is mind blowing, but it isn’t the skills of the people that have stood out the most for me so far.  It is the dedication of each individual to doing the right thing for our customers and our partners that has made the biggest impression.  When it comes to gathering and delivering social media data, there are a lot of shortcuts that can be taken in order to save time, money, and effort.  Unfortunately, these shortcuts can often come at the expense of publishers, customers, or both.  The team at Gnip has no interest in shortcuts and that comes across in every individual discussion and in every meeting.  If I were going to describe this value in one word, the word would be “integrity”.

In my new role as President & COO, I’m responsible for helping the company grow quickly and smoothly while maintaining the great values that have been established from the company’s inception.  The growth has already started and I couldn’t be more pleased with the talent of the people who have recently joined the organization including: Bill Adkins, Seth McGuire, Charles Ince, and Brad Bokal who have all joined Gnip within the last week.  And, we are hiring more! In fact, it is worth highlighting one particular open position for a Customer Support Engineer.  I’m hard pressed to think of a higher impact role at our company because we consider supporting our customers to be such an important priority.  If you have 2+ years of coding experience including working with RESTful Web APIs and you love delivering over-the-top customer service, Gnip offers a rare opportunity to work in an environment where your skills will be truly appreciated.  Apply today!

I look forward to helping Gnip grow on top of a strong foundation of product, partners, and people.  If you have any questions, I can be reached at chris [at] gnip.com.

Announcing Multiple Connections for Premium Twitter Feeds

A frequent request from our customers has been the ability to open multiple connections to Premium Twitter Feeds on their Gnip data collectors. Our customers have asked and we have delivered!

While multiple connections to standard data feeds have been available for quite some time, we have only allowed one connection to our Premium Twitter Feeds.  Beginning today you will be able to open multiple mirrored connections to Power Track, Decahose, Halfhose, and all of our other Premium Twitter Feeds.  This feature will be helpful when testing connections to your Gnip data collector in different environments (such as staging or review) without having an impact on your production connection.

You may be saying “Sounds great Gnip, but will I be charged the standard Twitter licensing fee for the same tweet delivered across multiple connections?”. The answer is no!  You will pay a small flat fee per month for each additional connection.  If you’re interested in adding Multiple Connections to your Premium Twitter Feed please Contact Us.

Social Media in Natural Disasters

Gnip is located in Boulder, CO, and we’re unfortunately experiencing a spate of serious wildfires as we wind Summer down. Social media has been a crucial source of information for the community here over the past week as we have collectively Tweeted, Flickred, YouTubed and Facebooked our experiences. Mashups depicting the fires and associated social media quickly started emerging after the fires started. VisionLink (a Gnip customer) produced the most useful aggregated map of official boundary & placemark data, coupled with social media delivered by Gnip (click the “Feeds” section along the left-side to toggle social media); screenshot below.

Visionlink Gnip Social Media Map

With Gnip, they started displaying geo-located Tweets, then added Flickr photos with the flip of a switch. No new messy integrations that required learning a new API with all of it’s rate limiting, formatting, and delivery protocol nuances. Simple selection of data sources they deemed relevant to informing a community reacting, real-time, to a disaster.

It was great to see a firm focus on their core value proposition (official disaster relief data), and quickly integrate relevant social media without all the fuss.

Our thoughts are with everyone who was impacted by the fires.

Gnip License Changes this Friday, Aug 28th

As we posted last month there are some changes coming to the way we license use of the Gnip platform.  See: Gnip Licensing Changes Coming in August.

These updates will be put in place this Friday, August 28th.    The impact of the new licensing will be the following for our existing users.

  1. The Gnip Community Edition license will be disabled as it is no longer being offered.   Accounts that were created before August 1st will be set to inactive and no longer will be able to access the Gnip API or Developer Website.   If your company is in the process of evaluating Gnip for a commercial project and needs a longer amount of time to complete your project please contact us at info@gnip.com and we can extend your account on a longer trial.
  2. Gnip Standard Edition user accounts using the Commercial, Non-profit and Startup partner license options will continue to be available as they are not impacted by the change on Friday.   If you are a standard edition user and we accidentally your disable your account on Friday please contact us at info@gnip.com and we will reactivate the account.
  3. New users who created an account starting after August 1st will receive an email notification on the day their 30-day trial expires informing them they need to contact Gnip to obtain the appropriate license for their commercial, non-profit or partner use case.

We appreciate all the companies and developers who have built solutions using Gnip and look forward to continuing to deliver real-time data to power these solutions.   By making these adjustments in our licensing we will be able to focus on innovating the Gnip Platform and supporting the many companies and partners we are fortunate to work with every day.