What's In The [Social] Data?

I was reading about Higgs Boson this morning and came across this cartoon explanation of matter, particle acceleration, and Higgs Boson. The very last pic in the cartoon (below) reminded me of the cartoon that’s been in my head for years; the one that pops into my head when I go to work. It drives what we do at Gnip. All of our energy is focused on helping our customers answer that question; “What’s in the data?” We do this by reliably collecting, filtering, enriching, and delivering billions of public social activities (social data) to our customers with business critical data needs, everyday.

What's In the Data

What's In The Data slide from PHD Comics - http://www.phdcomics.com/comics.php?f=1489

Tebow Time Scores in Social Media

Wow, what a game!  If you missed the instant classic that was the Broncos/Steelers overtime game tonight, check out the recap.

When Tim Tebow connected with Demaryius Thomas on an 80-yard touchdown pass on the first play of overtime, we saw a noticeable spike in the overall volume of social media messages flowing through the Gnip platform.

Tebow Time!

Spike in Social Media Mentions when Tim Tebow Throws Winning Touchdown against the Steelers

Gnip Cagefight #2: Pumpkin Pie vs. Pecan Pie

Thanksgiving is a time for family gatherings, turkey with all the delicious fixings, football, and let’s not forget, pie! If your family is anything like mine, multiple pie flavors are required to satisfy the differing palates and strong opinions. So we wondered, which pies are people discussing for the holiday? What better way to celebrate and answer that question than with a Gnip Cagefight.

Welcome to the Battle of the Pies!

For those of you that have been in a pie eating contest or had a pie in the face, you know this one will be a fight all the way down to the very last crumb. In one corner (well actually it is the Gnip Octagon so can you really have corners, oh well) we have The Traditionalist, pumpkin pie and in the opposite corner, The New Comer, pecan pie. Without further ado, Ladies and Gentleman, Let’s Get Ready to Rumble, wait wrong sport. Let’s Fight!

Six Social Media Sources, Two Words, One Winner . . . And the Winner Is . . .

 

 Source  Pumpkin Pie  Pecan Pie  Winning Ratio
Pumpkin Pie to Pecan Pie
Twitter X 4:1
Facebook X 5:1
Google+ X 6:1
Newsgator X 3:1
WordPress X 5:1
WordPress Comments X 2:1
Overall +6 Winner! +0 :(

 

We looked at one week’s worth of data across six of the top social media sources and determined that pumpkin pie “takes the cake” (so to speak) across every source.

In this case, it is interesting to point out that in sources like Twitter, Facebook, Google+ and WordPress we see higher winning ratios, while sources that tend to have higher latency such as Newsgator and WordPress Comments were a little more even. Is this because, on further consideration, pecan pie sounds pretty good? Or is it that everyone will have to have two pies and, with pecan as the traditional second, it is highly discussed?

Top Pie Recipes

Even though pumpkin pie was our clear winner, we thought it would be fun to share a few of the most popular holiday pie recipes by social media source:

  1. Twitter – Cook du Jour Gluten-Free Pumpkin Pie and Pecan Pie Video Recipe from joyofcooking.com
  2. Facebook – Ben Starr’s Pumpkin Bourbon Pecan Pie Recipe
  3. Newsgator – BlogHer’s Pumpkin Pecan Roulade with Orange Mascarpone Cream Pie Recipe
  4. WordPress and WordPress Comments – Chocolate Bourbon Pecan Pie from allrecipes.com

Non-Traditional Thanksgiving Pies

Another interesting fact that came out of this Cagefight was the counts of non-traditional Thanksgiving pies that were mentioned across the social media sources we surveyed. Though we rarely find these useful for communicating numerical values effectively, you can’t not have a pie chart in this post.

Happy Thanksgiving!

Gnip Cagefight #1: Beer vs. Wine

Welcome to the very first edition of the Gnip Cagefight! Over the next couple of weeks we’ll select a common word pair to enter the Gnip Octagon to fight to the finish in a no holds barred battle of Tweets. Two words will enter. Only one will leave.

In addition to crowning the victor, we’ll also call out some of the fun, interesting, strange, and bizarre trends that we glean from the data. Leave us a comment with any contenders you’d like to see in the future.

Now without further delay, let’s dive into our first Gnip Cagefight… Put your hands together for Wine vs. Beer!

And the Winner is . . .

We looked at one week of Tweets that contained the words “beer” or “wine,” and beer was the more commonly used term, appearing in 53.1% of those tweets vs. 48.1% for wine. Now you might be saying, “Hey, that’s more than 100%!” You are correct! That’s because beer and wine appear together about 13,801 times–along with an uncomfortable hangover, we presume. (Is this an opportunity to sell aspirin?)

With beer as our victor, we wanted to answer the age old question . . .

What time is Beer Thirty?

To answer this question, we analyzed the volume of Tweets containing the term “beer” throughout each day and averaged that across the week’s worth of data we collected. Each Tweet’s time was moved into the time zone of the Tweeter and normalized against the daily cycle of Tweet volume. Based on the graph below, true beer thirty is 5pm local time. This gives great meaning to the saying “It’s 5 o’clock somewhere.”

Beer Drinkers have a Wider Vocabulary than Wine Drinkers

Another fascinating tidbit that came out of the data was that beer drinkers have a wider vocabulary than wine drinkers. Normalizing for the number of words used, we find that beer drinkers use 14% more distinct words than wine drinkers. Wine drinkers tend to use the same idioms, for example, “glass of wine” or “red wine,” more than beer drinkers use their most common phrases. Does this mean that beer drinkers are 14% smarter than wine drinkers? Or that they use very creative spelling? We won’t wade any further into that question, but you can be the judge.

That’s all for our inaugural Gnip Cagefight. Hope you enjoyed it and be sure to let us know what what words you’d like to see in the octagon in the future.

The VMAs, Lady Gaga and Data Science

Hi everyone. I’m the new Data Scientist here at Gnip. I’ll be analyzing the fascinating data that we have coming from all of our varied social data streams to pull out the stories, both impactful and trivial, that are flowing through social media conversations. I’m still getting up-to-speed but wanted to share one of the first social events that I’ve dug into, the 2011 MTV Video Music Awards.

Check out the info below and let me know in the comments what you think and what you’d like to see more of.  And now, on with the show…

3.6M Tweets Mention “VMA”

The volume of tweets containing “VMA” rose steadily from a few hours before the VMA pre-show was broadcast, up to the starting of the pre-show at 8:00 PM ET (00:00 GMT) and remained fairly strong during the event. It trailed to low volume within the hour after the VMA broadcast ended at 11:15 PM ET (03:15 GMT). Tweets mentioning “VMA” totaled 3.6M during the 7 hours surrounding and including the VMA broadcast.

 

Lady Gaga Steals the “Tweet” Show

The largest volume of tweets for an individual artist are the mentions of “gaga.” Lady Gaga performed early in the show and the surge of tweets during her performance surpassed 35k tweets per minute for about 8 minutes. Again in the second half, Lady Gaga tweet volume briefly jumped above 50k per minute. Tweets mentioning “gaga” totaled 1.8M during the 7 hours surrounding and including the VMA broadcast.

As you can see in the chart below, other artists that garnered significant tweet volumes included Beyonce’, Justin Beiber, Chris Brown, Katy Perry and Kanye West. Perry, West and Brown got a lot of attention during their appearances, while Justin Bieber and Lady Gaga lead the counts in volume by maintaining a fairly steady stream of tweets during the broadcast.

Term Representation of Tweets Sampled
VMA 44 %
Lady Gaga 21 %
Beyonce 16 %
Justin Bieber 10 %
MTV 9.2 %
Chris Brown 8.0 %
Katy Perry 5.6 %
Kanye West 4.8 %
Jonas 3.5 %
Taylor Swift 2.1 %
Rihanna 1.1 %
Eminem 0.55 %
Michael Jackson 0.18 %
Ke$ha 0.17 %
Cher 0.14 %
Paramore 0.12 %

 

 

 

Contrasting, it is interesting to note that Beyonce’ and Chris Brown gained most of their tweet attention around their performances with very larger surges in tweet volume. Beyonce’s volume–another Beyonce’ bump–continues after her performance as twitter users absorb the news of her pregnancy.

 

 

One surprise that emerges from looking for other artists connected to the VMAs was Michael Jackson’s tweet volume. While Jackson gleaned many Retweets after winning the King of the VMA poll, he also received a large number of natural tweets lamenting his passing and celebrating his past successes.

Methodology

The free-form text and limited length of twitter messages creates a number of challenges for monitoring an event via twitter comments. People refer to the event differently and focus on different parts of the event. There will be spelling variations and differences in idioms and nicknames used to describe people and performances. Do we search for “Bieber”,”Beiber” and “Justin”?  Will tweeters use “Beyonce” or Beyonce’”? Knowledge of what we are monitoring is required; preparing tools to adapt things we learn during the events is also essential to getting good results.

One effective strategy is to use one or two tokens to identify tweets related to the event. The objective is to choose terms that we know are related to the event, that won’t be widely used outside the event, and that will give a representative sample–diverse and with sufficient volume. Once we have started to collect the event-focused twitter sample, we can look for relevant terms correlated with the filter term to find out what else people are tweeting about during the event.

Hope you enjoyed this first post. Look for more to come.

 

We’ve Got Scooters, Yes We Do! We’ve Got Scooters, How About You?

The competition for talent in Boulder is heating up. Yes folks, Gnip is hiring. We are seeking a Marketing Director to lead Gnip’s marketing operations and activities, working closely with our sales team and COO to drive brand and industry recognition as well as product adoption.

As if the fantastic start-up environment, great team, downtown Boulder location, breakfast at work daily (be sure to follow us @breakfastatgnip), tabs at nearby coffee shops, Eldora ski passes, and Boulder recreation center passes weren’t enough, we are now offering a Honda Metropolitan Scooter as a signing bonus for the Marketing Director position! What more could you ask for?

So if you think you have what it takes to be part of our awesome team, we would love to hear from you! Also, be sure to check out our other open positions including; Sales Engineer, Sales Executive, Senior Software Engineer, Social Media Data AnalystSoftware Engineer, and two Customer Support Engineers.

 

Gnip. The Story Behind the Name

Have you ever thought “Gnip”. . . well that is a strange name for a company, what does it mean? As one of the newest members of the Gnip team I found myself thinking that very same thing. And as I began telling my friends about this amazing new start-up that I was going to be working for in Boulder, Colorado they too began to inquire as to the meaning behind the name.

Gnip, pronounced (guh’nip), got its name from the very heart of what we do, realtime social media data collection and delivery. So let’s dive in to . . .

Data Collection 101

There are two general methods for data collection, pull technology and push technology. Pull technology is best described as a data transfer in which the request is initiated by the data consumer and responded to by the data publisher’s server. In contrast, push technology refers to the request being initiated by the data publisher’s server and sent to the data consumer.

So why does this matter . . .

Well most social media publishers use the pull method. This means that the data consumer’s system must constantly go out and “ping” the data publisher’s server asking, “do you have any new data now?” . . . “how about now?” . . . “and now?” And this can cause a few issues:

  1. Deduplication – If you ping the social media server one second and then ping it again a second later and there were no new results, you will receive the same results you got one second ago. This would then require deduplication of the data.
  2. Rate Limiting – every social media data publisher’s server out there sets different rate limits, a limit used to control the number of times you can ping a server in a given time frame. These rate limits are constantly changing and typically don’t get published. As such, if your server is set to ping the publisher’s server above the rate limit, it could potentially result in complete shut down of your data collection, leaving you to determine why the connection is broken (Is it the API . . . Is it the rate limit . . . What is the rate limit)?

So as you can see, pull technology can be a tricky beast.

Enter Gnip

Gnip sought to provide our customers with the option: to receive data in either the push model or the pull model, regardless of the native delivery from the data publisher’s server. In other words we wanted to reverse the “ping” process for our customers. Hence, we reversed the word “ping” to get the name Gnip. And there you have it, the story behind the name!

30 Social Data Applications that Will Change Our World

Social media is popular — no surprise there. And as a result, there’s a huge amount of social media data in the world and every day the pool of data grows… not just a little bit, but enormously. For instance, just recently our partner Twitter blogged about their business growth and the numbers are staggering.

This social conversation data is valuable. Someday it will yield insights worth many millions, perhaps billions, of dollars for businesses. But the analyses and insights are only barely beginning to take shape. We hear from social media analytics companies every day and we see lots of interesting applications of this data. So… how can social media data be used? Here’s a partial list of social data applications that I believe will begin to take shape over the next decade or so:

  1. Product development direction
  2. Product feedback
  3. Customer service performance feedback
  4. Customer communications
  5. Stock market prediction
  6. Domestic/political mood analysis
  7. Societal trend analysis
  8. Offline marketing campaign impact measurement
  9. Word-of-mouth marketing campaign analysis
  10. URL virality analysis
  11. News virality analysis
  12. Domestic economic health indicator
  13. Linguistic analysis
  14. Educational achievement metric by time and locale
  15. Personal scheduling: see when your friends are busy
  16. Event planning: see when big events will happen in your community
  17. Online marketing
  18. Sales mapping & identification
  19. Consumer behavior analysis
  20. Internet safety implementation
  21. Counter-terrorism probabilistic analysis
  22. Disaster relief communication, mapping, and analysis
  23. Product development opportunity identification
  24. Competitive analysis
  25. Recruiting tools
  26. Connector, Maven, and Salesperson identification (to borrow Malcolm Gladwell’s terms)
  27. Cross-platform consumer alerting services
  28. Brand monitoring
  29. Business accountability ratings
  30. Product and service reviews

All of these projects can be built on public social media conversation data that’s legally and practically accessible. All of the necessary data is (or is on the roadmap to be) accessible via Gnip. But access to the data is only step one — the next step is building great algorithms and applications to draw insights from that data. We leave that part to our customers.

So, here’s to the analysts who are working with huge social data sets to bring social data analyses and insights to fruition and ultimately make the barrage of public data that surrounds us increasingly useful. Here at Gnip we’re grateful for your efforts and eager to find out what you learn.

Our Poem for Mountain.rb

Hello and Greetings, Our Ruby Dev Friends,
Mountain.rb we were pleased to attend.

Perhaps we did meet you! Perhaps we did not.
We hope, either way, you’ll give our tools a shot.

What do we do? Manage API feeds.
We fight the rate limits, dedupe all those tweets.

Need to know where those bit.ly’s point to?
Want to choose polling or streaming, do you?

We do those things, and on top of all that,
We put all your results in just one format.

You write only one parser for all of our feeds.
(We’ve got over 100 to meet your needs.)

The Facebook, The Twitter, The YouTube and More
If mass data collection makes your head sore…

Do not curse publishers, don’t make a fuss.
Just go to the Internet and visit us.

We’re not the best poets. Data’s more our thing.
So when you face APIs… give us a ring.

New Office Toy: The Parrot Quadricopter Hovercraft

We spend a lot of time making it as easy as possible for our customers to get the social data they need. Usually. But last week, we had a new addition to the Gnip team: the Parrot AR Drone. 

It’s a 2’ by 2’ hovercraft with a camera attached that you control realtime via an iPhone app. In form with its “Parrot” name, it is the loudest thing around our office by a long shot. So, if you talked with any of us on the phone last week… we apologize. But the best part is, the Parrot’s a 3D video game. While we’re flying him around our office, our iPhones are showing a virtual universe that Parrot has been tasked with conquering.

So far we’ve taken him for a spin around the office, down the hall to the next startups (NBS and Everlater loved him), and even outside for a quick spin on the streets of Boulder. We’ve also narrowly avoided chopping off a few of our fingers and setting off our office sprinkler system. He’s a little hard to control at first, okay? Thanks to Natty for the video:

The Gnip Parrot Flies, Crashes and Burns

Any Parrot fans around Boulder? Stop by for a game! Who knows what next week will bring… but for now, as we go back to building out our API aggregation software… it’s a darn good thing we do still have all our fingers.