• Posted by Seth McGuire, Director of Asset Management & Financial Technology
No Comments

While the market has been on its roller coaster ride across the past month, Gnip has kept its collective head down and stayed busy on behalf of our Investment Management clients (hedge funds, HFTs, asset managers, etc.). That hard work has paid off and we have two exciting announcements to make today.

  • Launch of Gnip MarketStream: Our hedge fund clients have been quite vocal in their desire for a package incorporating the most relevant social media data streams into a single low-latency, high-volume solution. We’re proud to answer their needs with the launch of Gnip MarketStream, a realtime data solution that packages the incredibly rich and broad “voice of the market” Twitter stream with the uniquely deep and targeted “voice of the trader” StockTwits stream.
  • Premium Partnership with StockTwits: An integral component of the Gnip MarketStream is StockTwits social media data. We’re thrilled to announce this partnership with StockTwits, the leading realtime financial platform for the investment community and creator of the $(TICKER) tag. The StockTwits stream is a curated, defined-demographic, realtime social data stream focused on investment decisions and analysis. Gnip now provides streaming access to the full StockTwits firehose of social data, and offers access to historical content as far back as 2009.

While the use of social media data by the investment community has included use of this data in news analysis and equity research, the primary adoption of this data across the last six months has been as a trading indicator. By combining the strengths of both the Twitter stream and the StockTwits stream, Gnip MarketStream provides investment professionals unparalleled access to relevant social data at time when social media has become an increasingly vital channel for news and market sentiment.

For more information about Gnip MarketStream or StockTwits data, contact trading@gnip.com.

  • Posted by Jud Valeski, Co-Founder and CEO
2 Comments

It took awhile, but Gnip’s now a Boulder Chamber of Commerce (@boulderchamber) Member. We joined after a pattern of clear value to our particular industry became clear. In August of this year they hosted an event on that put us face-to-face with a the U.S. Department of Commerce Under Secretary for International Trade (Francisco Sánchez) and Colorado Congressman (Jared Polis) where we discussed software patent issues, as well as immigration visa challenges the U.S. tech industry faces. Tonight I’m attending an event with Congressman Polis and a local software Venture Capitalist (Jason Mendelson) to talk about challenges surrounding the hiring of technical talent locally, and globally.

These are topics with significant political/legislative dynamics, and the Chamber has given us, a local software firm, access to relevant forums in which we can get our point of view on the table; thank you.

Whether or not the Chamber has been providing this kind of relevant access all along, I don’t know (my perception is otherwise). I do know that the impact they’re having on us as a local software business, as well as the channel they’re giving Gnip to get its perspective heard in the broader (National) forum, is significant. I’d encourage other Boulder software/technology firms to support their efforts, contribute in their events, and help them build an agenda that in the end, helps us be more effective software/technical businesses.

Join us, in joining the Chamber.

Simplicity Wins

November 10th, 2011
  • Posted by Chris Hogue, Development
No Comments

It seems like every once in a while we all have to re-learn certain lessons.

As part of our daily processing, Gnip stores many terabytes of data in millions of keys on Amazon’s S3. Various aspects of serving our customers require that we pour over those keys and the data behind them, regularly.

As an example, every 24 hours we construct usage reports that provide visibility into how our customers are using our service. Are they consuming a lot or a little volume? Did their usage profile change? Are they not using us at all? So on and so on. We also have what we affectionately refer to as the “dude where’s my tweet” challenge; of the billion activities we deliver each day to our customers, inevitably someone says “hey, I didn’t receive Tweet ‘X’ what gives?” Answering that question requires that we store the ID of every Tweet a customer ever receives. Pouring over all this data every 24 hours is a challenge.

As we started on the project, it seemed like a good fit for Hadoop. It involves pulling in lots of small-ish files, doing some slicing, aggregate the results, and spitting them out the other end. Because we’re hosted in Amazon it was natural to use their Elastic MapReduce service (EMR).

Conceptually the code was straight forward and easy to understand. The logic fit the MapReduce programming model well. It requires a lot of text processing and sorts well into various stages and buckets. It was up and running quickly.

As the size of the input grew it started to have various problems, many of which came down to configuration. Hadoop options, JVM options, open file limits, number and size of instances, number of reducers, etc. We went through various rounds of tweaking settings and throwing more machines in the cluster, and it would run well for a while longer.

But it still occasionally had problems. Plus there was that nagging feeling that it just shouldn’t take this much processing power to do the work. Operational costs started to pop up on the radar.

So we did a small test to check the feasibility of getting all the necessary files from S3 onto a single EC2 instance and processing it with standard old *nix tools. After promising results we decided to pull it out of EMR. It took several days to re-write, but we’ve now got a simple Ruby script using various *nix goodies like cut, sort, grep and their friends. The script is parallel-ized via JRuby threads at various points that make sense (downloading multiple files at once and processing the files independently once they’ve been bucketed).

In the end it runs in less time than it did on EMR, on a single modest instance, is much simpler to debug and maintain, and costs far less money to run.

We landed in a somewhat counter-intuitive place. There’s great technology available these days to process large amounts of data; we continue to use Hadoop for other projects. But as we start to bring them into our tool-set we have to be careful not to forget the power of straight forward, traditional tools.

Simplicity wins.

  • Posted by Jud Valeski, Co-Founder and CEO
No Comments

I’m excited to announce that, as of the end of October, Gnip is delivering over 30 billion paid social media activities per month to our customers. This is the largest number of paid social media activities that have ever been distributed in a 30 day period. 

Over the past year, we’ve seen extraordinary growth in the number of paid social media activities we deliver. At the start of 2011, Gnip was delivering 300 million activities per month.  By May, that number was up to 3 billion activities per month.  And in October, we delivered 30 billion activities.  In essence, we’ve been growing by a factor of 10 every 5 months.  At this rate, we’ll be delivering 300 billion activities per month by March of next year

Cool numbers, but what’s driving this growth?

We’re seeing three key areas that are driving this number. First, we’re signing on new customers at an increasing rate, as more and more companies are seeing the possibilities in social media data. Second, we’re seeing increased interest in our Twitter firehose products. From hedge funds using social data to drive trading strategies to business intelligence companies layering social data onto their existing structured data sources, interest in volume products from Twitter is consistently increasing.  And finally, we’re seeing a marked increase in the number of customers using multiple sources to enrich their product capabilities.  From boards and forums to YouTube and Facebook, our customers are seeing the potential in the many other social media sources we offer.

So, 300 billion per month by March? It’s a big number, but the way things are going, I’ll take the over.

  • Posted by Seth McGuire, Director of Asset Management & Financial Technology
No Comments

Gnip’s asset and investment management clients are consistently impressed by two aspects of our social media data that differentiate this data from their other sources: Speed & Amplification.

Speed

Speed relates to the ability of social media content to be ‘instant’; an ability fueled by millions of global users who can break news and sentiment more immediately than traditional media sources always can.

A prime example is news of the death of Osama Bin Laden. Keith Urbahn, the former chief of staff for Don Rumsefeld, is widely credited with the breaking that story… through Twitter!

After Keith’s tweet, multiple retweets quickly followed. Within 19 tweets on this subject, a company called DataMinr had identified this as an important and breaking story. DataMinr, a “global sensor network for emerging events and consumer signals,” then issued a signal to their clients, alerting them to this important piece of information.

How does this play into the ‘speed’ characteristic? Because it would be over 20 minutes before that story appeared on traditional news sites. Access to a data stream that can beat traditional media sources by over 20 minutes requires no explanation as to its value for traders and investors.

Amplification

Amplification speaks to the ability of social media as a ‘crowd-sourced megaphone.’ The propensity of users to like, share, and retweet content from other users gives those consuming social media data an extremely easy mechanism to measure what content is most important to the world – and compare that content against other content in real time.

A prime example is the passing of Steve Jobs. We wrote about Steve Jobs’ passing a few weeks ago – that post is here – but there’s an important item to revisit:

The impact he had on us made his death that much more profound and the reaction on Twitter was immediate and immense. Word spread rapidly, peaking at 50,000 Tweets per minute within 30 minutes. At that point, Tweets about Jobs accounted for almost 25% of all Tweets being sent globally.

Access to Gnip’s social media data stream allowed our clients to measure, in the moment, the amplification of this story to measure the importance the world placed on this piece of news. While I doubt any of us needed to see those numbers to know Steve’s passing was an important piece of news, that’s a clear example of how ‘amplification’ works.

Our clients use amplification as a measure to weigh the importance of breaking news, upcoming events, market and product announcements, etc. against other stories. By capturing a realtime snapshot of what the market considers important – and what it doesn’t – they’re able to add an important factor to their existing algorithms.

None of this is to suggest that either social media data speed or amplification should be a sole factor in investing. But when the Gnip social media data stream provides clients with an additional factor to help understand or predict market fluctuations, the value is obvious.

Google+ Now Available from Gnip

October 27th, 2011
  • Posted by Adam Tornes, Product
No Comments

Gnip is excited to announce the addition of Google+ to its repertoire of social media data sources. Built on top of the Google+ Search API, Gnip’s stream allows its customers to consume realtime social media data from Google’s fast-growing social networking service. Using Gnip’s stream, customers can poll Google+ for public posts and comments matching the terms and phrases relevant to their business and client needs.

Google+ is an emerging player in the social networking space that is a great pairing with the Twitter, Facebook, and other microblog content currently offered by Gnip. If you are looking for volume, Google+ quickly became the third largest social networking platform within a week of its public launch and some are projecting it to emerge as the world’s second largest social network within the next twelve months. Looking to consume content from social network influencers? Google+ is where they are! (even former Facebook President Sean Parker says so).

By working with Gnip along with a stream of Google+ data (and the availability of an abundance of other social data sources), you’ll have access to a normalized data format, unwound URLs, and data deduplication. Existing Gnip customers can seamlessly add Google+ to their Gnip Data Collectors (all you need is a Google API Key). New to Gnip? Let us help you design the right solution for your social data needs, contact sales@gnip.com.

Gnip is Headed to Defrag

October 20th, 2011
  • Posted by Bre Zigich, Marketing
No Comments


Just wanted to give everyone an update on the Gnip events front. We are gearing up for our appearance at the upcoming Defrag Conference taking place in our neighboring town of Broomfield, Colorado at the Omni Interlocken Resort on November 9th and 10th. We are excited to be a sponsor of this year’s event that will focus on the exploration of the tools and technologies that intersect around the data deluge.

At the conference, Gnip’s very own Chris Moody (@chrismoodycom) will be featured as a keynote speaker, where he will discuss, Emerging Use Cases for Big Social Data. During this presentation Chris will highlight the ways that social data is being used to drive innovation across a variety of industries from Financial Services and Emergency Response to Local Business and Social CRM. Chris will also unveil the importance of having the right data for the right need with the debut of Gnip’s Social Data Analysis Grid. Be sure to catch Chris’ presentation on November 9th from 11:30am to 11:45am.

With the conference is just around the corner, we wanted to extend everyone a discounted registration code on behalf of Gnip. To receive a 20% discount on your registration, contact sales@gnip.com by November 1st.

Finally, Gnip’s Director of Sales, Fred Funke (@funkefred) and Data Scientist, Scott Hendrickson (@drskippy27) will also be at the conference.

Let us know if you will be attending, we’d love to chat. See you at Defrag!

  • Posted by Scott Hendrickson, Data Science
4 Comments

Welcome to the very first edition of the Gnip Cagefight! Over the next couple of weeks we’ll select a common word pair to enter the Gnip Octagon to fight to the finish in a no holds barred battle of Tweets. Two words will enter. Only one will leave.

In addition to crowning the victor, we’ll also call out some of the fun, interesting, strange, and bizarre trends that we glean from the data. Leave us a comment with any contenders you’d like to see in the future.

Now without further delay, let’s dive into our first Gnip Cagefight… Put your hands together for Wine vs. Beer!

And the Winner is . . .

We looked at one week of Tweets that contained the words “beer” or “wine,” and beer was the more commonly used term, appearing in 53.1% of those tweets vs. 48.1% for wine. Now you might be saying, “Hey, that’s more than 100%!” You are correct! That’s because beer and wine appear together about 13,801 times–along with an uncomfortable hangover, we presume. (Is this an opportunity to sell aspirin?)

With beer as our victor, we wanted to answer the age old question . . .

What time is Beer Thirty?

To answer this question, we analyzed the volume of Tweets containing the term “beer” throughout each day and averaged that across the week’s worth of data we collected. Each Tweet’s time was moved into the time zone of the Tweeter and normalized against the daily cycle of Tweet volume. Based on the graph below, true beer thirty is 5pm local time. This gives great meaning to the saying “It’s 5 o’clock somewhere.”

Beer Drinkers have a Wider Vocabulary than Wine Drinkers

Another fascinating tidbit that came out of the data was that beer drinkers have a wider vocabulary than wine drinkers. Normalizing for the number of words used, we find that beer drinkers use 14% more distinct words than wine drinkers. Wine drinkers tend to use the same idioms, for example, “glass of wine” or “red wine,” more than beer drinkers use their most common phrases. Does this mean that beer drinkers are 14% smarter than wine drinkers? Or that they use very creative spelling? We won’t wade any further into that question, but you can be the judge.

That’s all for our inaugural Gnip Cagefight. Hope you enjoyed it and be sure to let us know what what words you’d like to see in the octagon in the future.

Steve Jobs – Rest in Peace

October 7th, 2011
  • Posted by Randy Almond, Marketing
No Comments

Steve Jobs was an innovator, entrepreneur and visionary leader who had an enormous impact on every one of us.  He brought warmth and humanity to the world of technology and in the process changed the entire the way we as humans interact with each other.  The path he blazed was quickly followed by others and even if you don’t own an Apple product, the computer/tablet/phone you are using is better because of him.

The impact he had on us made his death that much more profound and the reaction on Twitter was immediate and immense.  Word spread rapidly, peaking at 50,000 Tweets per minute within 30 minutes.  At that point, Tweets about Jobs accounted for almost 25% of all Tweets being sent globally.

Tweets per Minute

Looking at the content of those Tweets, you see expressions of sadness and loss, thanks for everything he did, and a celebration of his genius and talent.  All sentiments we felt here at Gnip.

Top Terms

Thank you for everything Steve.  The world is a poorer place without you.  Rest in peace.

  • Posted by Scott Hendrickson, Data Science
4 Comments

Hi everyone. I’m the new Data Scientist here at Gnip. I’ll be analyzing the fascinating data that we have coming from all of our varied social data streams to pull out the stories, both impactful and trivial, that are flowing through social media conversations. I’m still getting up-to-speed but wanted to share one of the first social events that I’ve dug into, the 2011 MTV Video Music Awards.

Check out the info below and let me know in the comments what you think and what you’d like to see more of.  And now, on with the show…

3.6M Tweets Mention “VMA”

The volume of tweets containing “VMA” rose steadily from a few hours before the VMA pre-show was broadcast, up to the starting of the pre-show at 8:00 PM ET (00:00 GMT) and remained fairly strong during the event. It trailed to low volume within the hour after the VMA broadcast ended at 11:15 PM ET (03:15 GMT). Tweets mentioning “VMA” totaled 3.6M during the 7 hours surrounding and including the VMA broadcast.

Lady Gaga Steals the “Tweet” Show

The largest volume of tweets for an individual artist are the mentions of “gaga.” Lady Gaga performed early in the show and the surge of tweets during her performance surpassed 35k tweets per minute for about 8 minutes. Again in the second half, Lady Gaga tweet volume briefly jumped above 50k per minute. Tweets mentioning “gaga” totaled 1.8M during the 7 hours surrounding and including the VMA broadcast.

As you can see in the chart below, other artists that garnered significant tweet volumes included Beyonce’, Justin Beiber, Chris Brown, Katy Perry and Kanye West. Perry, West and Brown got a lot of attention during their appearances, while Justin Bieber and Lady Gaga lead the counts in volume by maintaining a fairly steady stream of tweets during the broadcast.

Term Representation of Tweets Sampled
VMA 44 %
Lady Gaga 21 %
Beyonce 16 %
Justin Bieber 10 %
MTV 9.2 %
Chris Brown 8.0 %
Katy Perry 5.6 %
Kanye West 4.8 %
Jonas 3.5 %
Taylor Swift 2.1 %
Rihanna 1.1 %
Eminem 0.55 %
Michael Jackson 0.18 %
Ke$ha 0.17 %
Cher 0.14 %
Paramore 0.12 %

Contrasting, it is interesting to note that Beyonce’ and Chris Brown gained most of their tweet attention around their performances with very larger surges in tweet volume. Beyonce’s volume–another Beyonce’ bump–continues after her performance as twitter users absorb the news of her pregnancy.

One surprise that emerges from looking for other artists connected to the VMAs was Michael Jackson’s tweet volume. While Jackson gleaned many Retweets after winning the King of the VMA poll, he also received a large number of natural tweets lamenting his passing and celebrating his past successes.

Methodology

The free-form text and limited length of twitter messages creates a number of challenges for monitoring an event via twitter comments. People refer to the event differently and focus on different parts of the event. There will be spelling variations and differences in idioms and nicknames used to describe people and performances. Do we search for “Bieber”,”Beiber” and “Justin”?  Will tweeters use “Beyonce” or Beyonce’”? Knowledge of what we are monitoring is required; preparing tools to adapt things we learn during the events is also essential to getting good results.

One effective strategy is to use one or two tokens to identify tweets related to the event. The objective is to choose terms that we know are related to the event, that won’t be widely used outside the event, and that will give a representative sample–diverse and with sufficient volume. Once we have started to collect the event-focused twitter sample, we can look for relevant terms correlated with the filter term to find out what else people are tweeting about during the event.

Hope you enjoyed this first post. Look for more to come.

Follow Gnip

Search

Archive

Recent Posts
Categories
Tags
Blogroll

Recent Tweets

Switch to our mobile site