Streaming Data Just Got Easier: Announcing Gnip’s New Connector for Amazon Kinesis

I’m happy to announce a new solution we’ve built to make it simple to get massive amounts of social data into the AWS cloud environment. I’m here in London for the AWS Summit where Stephen E. Schmidt, Vice President of Amazon Web Services, just announced that Gnip’s new Kinesis Connector is available as a free AMI starting today in the AWS Marketplace. This new application takes care of ingesting streaming social data from Gnip into Amazon Kinesis. Spinning up a new instance of the Gnip Kinesis Connector takes about five minutes, and once you’re done, you can focus on writing your own applications that make use of social data instead of spending time writing code to consume it.

 

AWS_Logo_PoweredBy_300px

 

Amazon Kinesis is AWS’s managed service for processing streaming data. It has its own client libraries that enable developers to build streaming data processing applications and get data into AWS services like Amazon DynamoDB, Amazon S3 and Amazon Redshift for use in analytics and business intelligence applications. You can read an in-depth description of Amazon Kinesis and its benefits on the AWS blog.

We were excited when Amazon Kinesis launched last November because it helps solve key challenges that we know our customers face. At Gnip, we understand the challenges of streaming massive amounts of data much better than most. Some of the biggest hurdles – especially for high-volume streams – include maintaining a consistent connection, recovering data after a dropped connection, and keeping up with reading from a stream during large spikes of inbound data. The combination of Gnip’s Kinesis Connector and Amazon Kinesis provides a “best practice” solution for social data integration with Gnip’s streaming APIs that helps address all of these hurdles.

Gnip’s Kinesis Connector and the high-availability Amazon AWS environment provide a seamless “out-of-the-box” solution to maintain full fidelity data without worrying about HTTP streaming connections. If and when connections do drop (it’s impossible to maintain an HTTP streaming connection forever), Gnip’s Kinesis Connector automatically reconnects as quickly as possible and uses Gnip’s Backfill feature to ingest data you would have otherwise missed. And due to the durable nature of data in Amazon Kinesis, you can pick right back up where you left off reading from Amazon Kinesis if your consumer application needs to restart.

In addition to these features, one of the biggest benefits of Amazon Kinesis is its low cost. To give you a sense for what that low cost looks like, a Twitter Decahose stream delivers about 50MM messages in a day. Between Amazon Kinesis shard costs and HTTP PUT costs, it would cost about $2.12 per day to put all this data into Amazon Kinesis (plus Amazon EC2 costs for the instance).

Gnip’s Kinesis Connector is ready to use starting today for any Twitter PowerTrack or Decahose stream. We’re excited about the many new, different applications this will make possible for our customers. We hope you’ll take it for a test drive and share feedback with us about how it helps you and your business do more with social data.

Gnip and Amazon AWS

Data Stories: Gabriel Banos of Zauber Labs on Predicting the Election With Social Data

Can social data predict the 2012 Presidential election? We decided to ask a Gabriel Banos, CEO and Founder of Gnip client, Zauber Labs, about their work with social data and the elections. They’ve been following the 2012 Presidential Election social data closely and have interesting findings on share of voice and the differences between Twitter followers. Zauber Labs is the company behind Tribatics, which offers powerful insights into the demographics and behaviors of online followers, and Flowics, which has the ability to create live infographics using live Twitter data. As a side note, they are also the authors of Gnip4j, the open source java library to access Gnip Twitter feeds

1. Twitter is considered to have a bigger pre-existing user base of Democrats. How do you think political bias plays into Twitter and affects the results of the Flowics charts?

Obama vs. Romney Twitter Share of Voice

    Buzz Volume measured by Flowics during last month comparing Obama vs Romney

In general, we haven’t seen such an impact on buzz volume daily charts: difference in number of mentions for one candidate and the other one usually remains between average boundaries. We did saw an exception during the days of the Democratic National Convention, when Obama registered its record of buzz during the last month, far beyond Romney’s previous record during the Republican National Convention. This could be interpreted as a confirmation that Democrats are more active on Twitter. Last but not least, Romney’s own record of tweets during the Republican Convention was then overpassed by the buzz generated by the release of this video talking to millionaires at a private fundraising event.

2. Do you think social data can help predict the election?

Of course, we’ve seen how share of voice among different candidates can be used as a valid data input for models predicting elections result. We were able to confirm it last year for example, during Argentina’s presidential elections (share of voice figures for the main candidates clearly resembled the final election’s result). But that hadn’t been the case when we ran an experiment in 2010 for Brazilian elections. So, market researchers who might be interested in using social data to predict this kind of phenomena need to understand that a network like Twitter still has an important bias in some countries, that does not represent the composition of a whole society. For example, we still see more males than females being active on Twitter in most of the countries, although we know that female population usually surpasses the male population in most countries. So, social data used as a predictor or sensor for any market research or public opinion study, needs to be adjusted if we want to use as a snapshot of what a complete society thinks or feels.

3. What have you learned with Tribatics about the different demographic information about the Twitter followers of Obama and Romney?

While Obama has more than 20 million followers and Romney is approaching 1.2 million, only 415,000 people follow both candidates, that’s 35% of Romney’s followers but only 2% of Obama’s followers.

This is mostly due to the fact that Obama is a political figure widely recognized outside of the US, whereas Romney is still only relevant to American Twitter users. We were able to identify the location of 12.5% of Obama’s followers and only 29% of them are in the US, compared to 66% of Romney’s followers. In the case of Romney, we could successfully identify the location of nearly 30% of his followers.

The fact that Obama is widely recognized and followed beyond American citizens is also confirmed by the countries coming after the US that contribute to Obama’s followers:

  • UK, Brasil, Indonesia, India, Mexico

In the case of Romney, the list is as follows:

  • UK, Canada, Mexico, Cuba, Brasil
Obama Twitter Follower Demographic Breakdown

Distribution of Obama’s followers based on successful location inference 12.5% of his followers

 

Romney Twitter Followers Demographics

Distribution of Romney’s followers based on successful location inference for 30% of his followers

4. What can a political candidate learn by looking at social data?

There’s so much to learn from social data, that it would be easier to answer what you cannot learn! :)

With the current offering of Social Intelligence and Social Media Monitoring tools (with Tribatics being a product in these categories), a political candidate could learn:

  • Which are the relevant topics of discussion of people talking about him or her?
  • Who are the most influential authors behind these conversations?
  • Demographics of the authors engaging in conversations
  • How does his or her Social Media performance compares to other competitors
  • Discover your most relevant followers and compare your audience to your competitor’s audience.

Funny facts you could also discover with Tribatics by comparing Obama’s most relevant followers with Romney’s ones: celebrities such as Lady Gaga, Justin Bieber, Shakira and Ashton Kutcher follow the US president, but do not follow Romney. Does this fact indicates how they would vote ?

Top Obama Twitter Followers

List of Obama’s top followers by number of followers

 

Romney's Top Twitter Followers

List of Romney’s top followers by number of followers

Continue reading