4 Things You Need To Know About Migrating to Version 1.1 of the Twitter API

Access to Twitter data through their API has been evolving since its inception. Last September, Twitter announced their most recent changes which will take effect this coming March 5. These changes make enhancements to feed delivery, while further limiting the amount of Tweets you can get from the public Twitter API.

The old API was version 1.0 and the new one is version 1.1. If your business or app relies on Twitter’s public API, you may be asking yourself “What’s new in Twitter API 1.1?” or “What changed in Twitter API 1.1?” While there’s not much new, a lot has changed and there are several steps you need to take to ensure that you’re still able to access Twitter data after March 5th.

1. OAuth Connection Required
In Twitter API 1.1, access to the API requires authentication using OAuth. To get your Twitter OAuth token, you’ll need to fill out this form.  Note that rate limits will be applied on a per-endpoint, per-OAuth token basis and distributing your requests among multiple IP addresses will not work anymore as a workaround. Requests to the API without OAuth authorization will not return data and will receive a HTTP 410 Gone response.

2. 80% Less Data
In version 1.0, the rate limit on the Twitter Search API was 1 request per second. In Twitter API 1.1, that changes to 1 request per every 5 seconds. A more stark way to put this is that previously you could make 3600 requests/hour but you are now limited to 720 requests/hour for Twitter data. Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits.

3. New Endpoint URLs
Twitter API 1.1 also has new endpoint URLs that you will need to direct your application to in order to access the data. If you try to access the old endpoints, you won’t receive any data and will receive a HTTP 410 Gone response.

4. Hello JSON. Goodbye XML.
Twitter has changed the format in which the data is delivered. In version 1.0 of the Twitter API, data was delivered in XML format. Twitter API 1.1 delivers data in JSON format only. Twitter has been slowly transitioning away from XML starting with the Streaming API and Trend API.  Going forward, all APIs will be using JSON and not XML. The Twitter JSON API is a great step forward as JSON has a much wider standardization than XML does.

All in all, some pretty impactful changes.  If you’re looking for more information, we’ve provided some links below with more details.  If you’re interested in getting full coverage commercial grade access to Twitter data where rate limits are a thing of the past, check out the details of Gnip’s Twitter offerings.  We have a variety of Twitter products, including realtime coverage and volume streams, as well as access to the entire archive of historical Tweets.

Update: Twitter has recently announced that the Twitter REST API v1.0 will officially retire on May 7, 2013. Between now and then they will continue to run blackout tests and those who have not migrated will see interrupted coverage so migrating as soon as possible is highly encouraged.

Helpful Links
Version 1.0 Retirement Post
Version 1.0 Retirement Final Dates
Changes coming in Twitter API 1.1
OAuth Application Form
REST API Version 1.1 Resources
Twitter API 1.1 FAQ
Twitter API 1.1 Discussion
Twitter Error Code Responses

Twitter and SXSW: Barometer of Trends

Last week we talked about tracking SXSW from 2007 to 2012 using Gnip’s Historical PowerTrack for Twitter. This gave us insight into year-over-year trends in SXSW Tweets and now we’re going to look at how SXSW trends have changed over time.

With every square inch of Austin packed with the social media influential, SXSW provides an interesting avenue to examine trends, big and small, to see what people are talking about on Twitter. Now that companies can use Gnip’s Historical PowerTrack for Twitter to baseline events, it provides a whole another avenue to determine trends.

Party vs. Panel
People have such a love/hate relationship with SXSW. Some people love it for its networking opportunities and great sessions, while other people decry it as one giant party. Letting the data speak for the truth, it seems that in earlier years of the conference, people came for the panels and hopefully to learn something from their peers. But by 2011, the word “party” overtook those interested in “panel” by more than 10,000 Tweets. People were talking about the best places to meet people rather than the best places to learn. That same year, there were 13,072 mentions of the word RSVP in SXSW Tweets talking about plans to find the best parties and likely indulging in the practice of RSVPing for 136 events and actually attending 12 of those events.

Geo-location Wars
While Twitter is useful for helping understand how cultural events are changing, the use cases extend further into helping understand the rise and fall of startups. With the launch of Foursquare and Gowalla at SXSW in 2009, it was the beginning of the so-called geo-location wars. Many people have wondered how Foursquare ended up the winner, and SXSW provides interesting insight into how Foursquare came out on top. Back in 2009, if you looked at SXSW Tweets, it would tell you it was anyone’s game because surprisingly Foursquare only received a little more 100 Tweets than Gowalla. By 2010, Foursquare had been more clearly marked as the winner with Foursquare receiving nearly double the Tweets that Gowalla was receiving. At that point, everyone was still writing posts to determine the pros and cons of each service, but the social data was clear — Foursquare had the buzz that year in part to their ability to easily publish updates, badges and mayorships on Twitter and perhaps even their rogue game of Foursquare outside the Convention Center. By 2011, Foursquare had completely suckerpunched Gowalla with Foursquare receiving the lion’s share of public voice receiving nearly 65,000 Tweets to Gowalla’s nearly 8,000 Tweets. By the end of 2011, Facebook ended up making an acqui-hire for the Gowalla team.

BBQ vs. Tacos
This next trend might seem silly, who cares if more people are interested in BBQ or Tacos? I mean, what significant impact can this social data have? But if you’re a restaurant chain or looking to start a new franchise chain, it would be interesting to know about cultural food trends such as the rise of cupcakes as it is happening.

While many have long suspected that Austin was a BBQ kind of town, the social data has shown that at the last SXSW, Tacos overtook BBQ as the most talked about grub to grab. More data science would have to be done to determine if the Taco is becoming a more widestream cultural trend, but when all other Tweet volumes were falling in 2012, the term Tacos was charging full-steam ahead.

This is just the beginning of what social data can tell companies about trends and market research. We think historical social data will provide invaluable to market research with the sheer volume of conversations that are happening on Twitter.

Stocktoberfest: Social Finance Conversation on Tap

Last year, Gnip and StockTwits partnered in order to bring this incredible social finance conversation to market – in both real-time and historical products. Given that relationship, we’ve worked closely with StockTwits and have been consistently impressed with how the trading community platform they’ve been building has progressed.

So this last week it was incredibly exciting to attend Stocktoberfest on Coronado Island, and to watch other attendees get a look into just how far this awesome product has come. What really sunk in for me personally was the extent to which the platform has evolved and how leveraged it now is.

In order to get a platform to actually work, you need a few things: community critical mass (check!), producers (check!), and consumers (check!). StockTwits has done an incredible job engaging the community directly on-site, and the latest API allows app developers to weave highly focused (by equity or currency) conversations, bi-directionally, into their apps in real-time.

From a community standpoint, checkout what StockTwits has pulled together:

  • 50+ activities (messages) posted per minute during market hours
  • 70+% of platform activities are “native” to the StockTwits platform
  • 600+ charts shared each day
  • 3000+ symbols covered each day
  • 6 million+ investors and traders read StockTwits content each month across distribution network and API partners
  • average stocktwits.com user spends 26 minutes on-site per visit
  • 200K+ Registered users
  • 400K+ Messages a month
  • 70+ Million API requests per month
  • 26% of StockTwits user access via mobile

The Financial charting and equity analysis ecosystem is vast, and because of the work StockTwits has done, that ecosystem can incorporate relevant social conversation directly into platforms and products. By doing so, players in this space can take advantage of the community StockTwits has already built, which lifts their tools, products, and platforms into social. Building community, whether focused or broad, is obviously a hard nut to crack. The more integrations you have, the more the overall community benefits; everybody wins. ChartIQ is one beautiful example of a product leveraging the StockTwits platform that proves this point.

I left Stocktoberfest giddy over where this stream is now. Seeing, and talking to, so many of the StockTwits platform partners was just awesome!

The impact “social” is having across the Finance space is tremendous; as is StockTwits’ role in it. We’re excited for the next few months and what our continuing work with them will bring.

Stage for Stocktoberfest

Tumblr Analytics: It’s a Whole New World

Union Metrics has been with Gnip since the early days, using our social data in their flagship product, TweetReach. Earlier this year, when we announced the availability of social data from Tumblr, we were excited that Union Metrics moved quickly to start building a new product based on that data. Last week, Union Metrics launched Union Metrics for Tumblr and was named Tumblr’s preferred analytics provider.

We’re big believers in Tumblr and the value of the conversations taking place there. As we’ve talked about in the social cocktail, Tumblr content has unique properties. Our data science shows that Tumblr content is inherently viral – able to amplify conversations about any topic – and even more than that, the content on Tumblr has incredible staying power.

And we’re not the only believers in Tumblr. Brands like Adidas and Coca-Cola have been actively engaging and advertising on Tumblr since the launch of Tumblr’s advertising platform earlier this year.

Congrats to the team at Union Metrics! This is exciting news and we’re only at the beginning.

You can read more in AdWeek, The Next Web and GigaOm.

A Moment in History: Access the Full Archive of Public Tweets

We are proud to announce that, for the first time, access to the entire historical archive of public Tweets, dating back to @Jack’s very first Tweet more than 7 years ago, is now available via our new product, Historical PowerTrack for Twitter. This product has been years in the making, and we can’t wait to see what the world will build with this data.

 

We believe that social data has unlimited value and near limitless application. The nature (fast & viral) and newness of social conversations has naturally directed focus to realtime applications. However, as the world becomes more reliant on realtime social data and the amount of social data created grows exponentially, the need to put this information into historical context has become increasingly important. Often, companies are considering the realtime reaction in social data and asking “is this good or bad?” This is one of the main questions historical data can answer. For example, if an auto manufacturer launches a new model and 25% of the social conversation is determined to be negative, is that healthy?  Knowing that the last model launched to record sales & had 40% negativity helps put the new realtime data into context.

Historical data can also be highly informative to predictions about the future. Researchers have suggested to us that they can predict the outcome of a revolution by studying past revolutions online such as the “Arab Spring”.  Likewise, we’re seeing hedge funds make a real commitment to incorporating social data into their trading algorithms. It is critical for these funds to be able to refine their predictive trading models by studying vast quantities of historical data.

Until now, all this promise of social data has had a foundational limitation: very little reliable and complete historical data has been available. And as we know, historical analysis is only as good as the quality of the underlying data. You can’t provide complete context if you only have part of the data.  That’s why we are so excited to be the first company to offer complete coverage of all public Tweets from the beginning of time.

We’re able to deliver the full historical corpus via our long-standing partnership with Twitter. We helped Twitter deliver the full archive of Tweets to the Library of Congress. That was a massive effort that took a long time. The rest of the social data ecosystem can benefit from that effort starting today.

This level of access has never been available and we know it is really going to accelerate the rate of innovation going forward. We think there are new products and businesses that will now be possible with access to a “social layer” of historical data. We frequently ask ourselves “If you could know what the world was saying at any moment in time about any topic, what could you build?”

We’ve already been working with companies like Esri, Union Metrics, Brandwatch, Waggener Edstrom Worldwide, and Texifter during our early access period and it’s been incredible to see how fast they are innovating with this new data.

Gnip aspires to be the source of record for all public conversation. That’s a lofty goal. We’re taking a major step forward with today’s announcement.

Want to learn more about Historical PowerTrack for Twitter?  Email info@gnip.com.

Data Stories: Interview with Lada Adamic of University of Michigan

While looking at the speakers for the International Conference on Weblogs and Social Media, the premier academic conference for social media, I stumbled across the research of Lada Adamic. Not only was Lada one of the keynote speakers for the conference, her research at the University of Michigan was just plain awesome. Lada’s research included understanding commonly used ingredient substituions from the 40,000 recipes in Allrecipes.com, understanding how peers rate each other on Couchsurfing, Facebook memes, and more. You can check out all of her research on ladamic.com, follow her on Twitter at @ladamic and be sure to check out her hilarious blog

Lada Adamic of the University of Michigan

1. Your background focuses on networks and how information spreads. You’ve done multiple projects with different data sources, what are some of the overarching trends you’ve seen?

The only sure thing is the unpredictability of information in a network. Sure, in aggregate some information will go viral, while most will not, but predicting what will go where, that’s not so simple. To complicate matters further, information is not only diffusing, but also evolving, while concurrently spurring changes in the social network itself. One trend I do keep seeing is that social networks’ greatest boosting effect is in the niche. There are lots of ways to find out about something widely popular, but information about that curious interest that you and your friends share — that is more likely to come through your friends.

2. What information do you get from looking at networks vs all the other sources you use?

I think it’s more a question of whether there are any data that I don’t try to represent as networks! All I have to do is identify connections between entities in the data, and presto, I have a network. It’s the structure of these connections that can turn up fascinating results: identifying experts from their online interactions, predicting which recipe is going to be rated more highly, or understanding the structure of federal law from the way it’s strung together.

3. What is useful, difficult and unique about connections found in social data?

Well, you’re dealing with data by and about humans. Humans are difficult. Humans interacting with other humans, that’s complicated… but also highly informative, because a lot of human interaction is about informing one another. And as they inform one another about what’s worthwhile, their location, their mood, etc., that data can be harnessed to detect trends and patterns in human behavior. And perhaps precisely because this data is so rich and powerful, it is important to be mindful of privacy.

4. You were able to determine commonly used ingredient substitutions by looking at 40,000 recipes from Allrecipes.com. How much did the comments in the recipes help determine substitutions and what other insights do you think could be pulled from recipe comments?

In the research paper we relied entirely on the comments in the user-supplied recipe reviews to figure out how often cooks substituted one ingredient for another in a recipe, whether ingredients can be cut or omitted, and, crucially, whether the recipe needs more or less garlic (our data showed, usually, more). Untapped kinds of information included in the reviews include who the recipe was a hit with (the kids, the husband etc.) and vetted improvements, e.g. “I put the dough in the fridge for 2 hours as the other reviewer suggested…”. I think this is a really fun example of harnessing our collective intelligence. Instead of each cook tweaking recipes in their own kitchen and sharing their recommendations with a few friends, now we can gather millions of tweaks and start to understand food and cooking systematically.

5. You’ve used data from a wide variety of sources including Couchsurfing.com, Allrecipes.com, Facebook, etc. What do you look for in a data source?

I’m not too discriminating about data, though sometimes I have a question that only certain data can answer. For example, when my husband and I first started dating, I defended my reluctance to watch Sci-Fi movies by pulling their ratings distribution from the IMDb. On an only slightly more serious note, I turned to online recipes because they comprised lots of data about something that I had no clue about: cooking.

Other times you just know the data is good even if your questions about it are not (yet). Such was the case with the CouchSurfing dataset, which encompassed anonymized user-to-user trust and friendship ratings. The data was so rich, that even our initial stumbling steps led to some interesting results about rating human relationships. But it wasn’t until the 2nd and 3rd paper that we really got a handle on how the visibility of the ratings skews them, and some more fundamental insights about the relationship between friendship and trust that are rendered beautifully evident in such a large data set.

6. What study have you done that has surprised you the most? What projects do you see in the future that you think academia should focus on to better understand social data?

Some nice surprises actually came up as I was gathering data for my statistics class. When the Economist published an article about the U-curve of happiness vs. age, I thought, wait a sec, we see the same curve in CouchSurfing ratings: people in their 30s & 40s rate and are rated less enthusiastically than those either younger or older. Then my statistics class used the American Time Use Survey to see how much sleep people were getting, and it was the same curve. Coincidence? I think not!

Another happiness vs. age trend came up in the Adolescent Health data, also analyzed in my stats class. Teens having sex in 8th and 9th grade were less happy on average than their peers who were abstaining, but by senior year, the relationship was reversed. It goes to show that you never know which underused columns in existing data sets hold fun statistics (we also explored the “cheerleading”, “math team” and “wears braces” columns…).

To answer the second question: researchers have only started to take advantage of the abundance of social data. There are many long-standing questions in sociology that were previously studied in small groups. Now these questions can be tested on very large data, just at the time when we really do need to understand how they pertain to changing social interactions as they shift online. Among the questions I’m personally interested in are how online social networks shape media consumption, and how information evolves in social networks.

I should mention that the crucial bottleneck for academics doing this kind of research is access to the data. GNIP is certainly part of the solution (you guys have academic discounts, right?). To anyone else who has interesting data, please consider sharing it with data-starved academics.

Thanks to Lada for her interview (and yes, we’re looking at partner programs for academic researchers!). If you have any other suggestions for Data Stories, please leave a comment. 

Continue reading

Data Stories: Sherry Emery on Social Data and Smoking Cessation

Sherry Emery is a Senior Research Scientist at the UIC Institute for Health Research and Policy focusing on understanding how both traditional and new media influence health behavior. Sherry’s research has been focusing on social data and smoking cessation, looking at how people talk about smoking, their behaviors and their reactions to smoking cessations campaigns on social media. Sherry works with Gnip’s client DiscoverText to access the Twitter firehose.  

Sherry Emery of UIC

1. You’ve been studying the media and smoking for the past 15 years, what caused you to be interested in social data?


For a long time my research focused on TV advertising, but a few years ago I began to worry that our work was going to be less and less relevant unless we started to understand new media, including social.

2.    How has your research with social data compared to previous research among other mediums, especially TV?
Researching social media and using social data is much harder — there’s more of it, and it’s way more complex. In the past, we were just worried about exposure to ads — and the measures were developed and widely accepted decades ago; now we’re still worried about exposure, but also searching for information, and sharing information on social media; and with social data, it’s still the wild west for measurements. How do you measure exposure, search and exchange across social platforms, and how are these behaviors related to health behavior. In addition, with TV advertising, there was only an anti-tobacco message to measure. With social media, we need to figure out who’s talking about smoking cigarettes, and how to distinguish them from people talking about smoking ribs or smoking hot girls. And then we need to figure out if the information they are promoting/sharing is pro- or anti-smoking.

3. What are insights from your research on smoking cessation and social data?

We’ve learned so much! First, lots of people who are talking about smoking are not talking about cigarettes! One of our biggest challenges has been to refine our key words and develop techniques to code Tweets and other content as tobacco-relevant. Early on in our process, Gnip’s own Charles Ince had the brilliant insight to introduce us to Stu Shulman, who developed DiscoverText, which is an invaluable tool tthat we rely upon for our data cleaning and analysis process.  DiscoverText allows us to sort through and code the millions of tweets that contain some reference to ‘smoking’.  Using DiscoverText gives us both the transparency and control that we need to make sure that the tweets we analyze are the tweets that are relevant to our research questions.  We can use humans to code for tobacco relevance, and then a boolian language recognition algorithm in DiscoverText can learnfrom the human coders, and code literally tens of thousands of tweets—actually more accurately than humans could at that scale!  As part of this process, we’ve also learned that there are lots and lots of words people use to talk about cigarettes and smoking tobacco — an obvious statement, but one that has really important implications for searching for/measuring the content we’re interested in. No matter how thorough, broad and prospective we try to be, we cannot anticipate all the the terms and keywords that turn out to be relevant. The ability to go back and look for content once we’ve identified key ideas will be critically important to our work. Now that we’re getting a handle on how to deal with this massive and very complex data, we’re also learning a ton about people are talking and thinking about smoking. In simplest terms, smoking weed is discussed much more favorably than smoking cigarettes. In the world of media campaignevaluation, we learned that the recent CDC anti-smoking media campaign really struck a chord with people — the effect of the graphic images were broad and deep. This was an important observation because the graphic approach of these ads were very controversial. By looking at the social media reaction, we could see that they achieve substantial engagement, rather than rejection of their message, which was a concern.

4.    People are less likely to be honest about bad habits on surveys. What are some of the advantages and disadvantages of using social data to capture life habits?
Social data reflects such spontaneous and generally unfiltered responses. It’s great to see and analyze what people are saying and claiming as their own. I think that surveys still have their place — there is a lot of individual-level information that is important, and which social data doesn’t reveal well. But it’s now critically important to understand what and how many much people are saying, searching for, and passing along on social platforms. These data can give context to traditional survey data and can also guide the development of better, more relevant surveys.

5.    Several years ago danah boyd talked about the class divisions between MySpace and Facebook, and how Facebook was for the “good” kids and MySpace was for the burnouts. How do you see the audiences matching up on segments you’re trying to study and the social data sources you’re using?
That’s an interesting question. So far, we’re pretty focused on Twitter data. We’re just beginning to explore Disqus and other social platforms, so I can’t really compare across platforms. We do see that there is particular language/words used on Twitter that seems to characterize different populations such as the slang words for cigarettes. By understanding the slang, we can see regional differences, as well as cultural differences, in attitudes about tobacco.

6.    How is the health world starting to use social data and what are some of the struggles they’re seeing?
The health world seems to be just starting to use social data. There’s some super cool work on developing social networks/data to monitor health conditions. I haven’t seen many other projects that are trying to wrangle massive social data similar to what we are doing. It’s hard to get your head around the variety and complexity of the data that is now available. We have been obsessed with data management and measure development. I think that’s the missing link for the public health world, and it’s one of our biggest challenges — translating how these data can answer questions the public health community is interested in.

Continue reading

Twist, Lick, Dunk: A Tumblr Story

Oreo Showing Pride

Tumblr won’t soon forget the day America’s favorite cookie came out.

On June 25th, to promote the year of Oreo’s 100th birthday, Nabisco lent its cookie some currency: The company tweeted the image of a six-layered cookie, with crèmes the color of the rainbow, above a simple caption – “Pride.”

“We feel the Oreo ad is a fun reflection of our values,” a Kraft spokesman later told reporters. The cookie, the company said, illustrated ‘in a fun and playful way’ an issue that was making history.

The image lit up the social web. This post, and two that follow, explore conversations on Tumblr through the lens of Oreo. Part Two looks at how the episode touched other brands on the network. Part Three dives into the dynamics of Tumblr conversations and how they diverge from other platforms.

The image itself touched a vein. Opponents to marriage equality took to Oreo’s accounts on Facebook and Twitter to slam Nabisco and threaten boycott.

“[U]nliking oreo, cleaning out cupboard, changing buying habits, no more Oreo’s, and it’s parent company,” one user wrote.

“I will never eat an oreo again! ew!” said another.

Those comments, and others, drew counter-protests, among them:

“[W]onderful job Oreo on supporting equal rights, just for that, now I’ll buy a pack today.”

“I believe I’m going to go buy every package of Oreos I see when I go grocery shopping. Kudos!!”

Within hours, Oreo found itself the subject of some 7,500 tweets. The conversation ramped to midnight EST, when the brand was pulling back some 2,000 tweets per hour.
Graph Demonstrating Twitter Volume Around Pride Oreo
Figure 1 shows hourly Twitter volumes around Oreo between June 18 and July 2.

Tumblr followed on the 26th. In three hours that night, the company drew more than 300 textual posts on the network, double what the brand had done each day the week before.

The talk stayed political: “Way to go Kraft!,” one post read, “However it is also eye-opening to see how many people are proud to show their hate, or belief that all Americans do not deserve equal rights.”

Graph Showing Tumblr Volume Around the Pride Oreo
Figure 2 shows hourly Tumblr volumes around Oreo between June 18 and July 2.

By then, the story had spilled. ABC, NBC, Reuters and the Washington Post amplified news of the flap. A conservative family group urged supporters to look elsewhere for cookies. Meanwhile, the image was slowly amassing more than 60,000 Facebook comments and close to 300,000 likes. Two social analytics companies would later call that conversation overwhelmingly positive – for Oreo.

For days on Tumblr, the story echoed. Median hourly Twitter volumes had returned to normal by the fracas’ fourth day. But on Tumblr, a full week after Oreo’s image went live, chatter remained triple the cookie’s prior volume.

In that way, the image marked a breakthrough for Oreo on Tumblr. At peak, the pride cookie generated 2.6 times Oreo’s median Twitter volume from the week prior. For Tumblr, that figure was 19.8.
Graph Demonstrating Increase in Tumblr Traffic After the Pride Oreo
Figure 3 shows the ratio between hourly platform volume around Oreo and typical hourly platform volumes between June 18 and July 2.

Oreo had long been a social brand. Before the pride cookie, it counted 26 million Facebook fans and tens of thousands of Twitter followers. On Tumblr, the cookie already outstripped its rivals. And in a move that may help the company retain that lead, Oreo can rely on oreodailytwist.tumblr.com, the brand’s official Tumblr presence. Its first posted image? June 25 – the pride cookie.

Graph Showing Oreo Compared to Other Cookie Brands on Tumblr

Figure 4 shows Oreo’s Tumblr lead over major cookie brands in the United States between June 18 and July 2.

But Oreo’s Tumblr story rippled beyond the cookie alone. That broadening – a central quality of the Tumblr platform – has implications for brands linked by product, demographic or, in this case, ideology. Return for more in Part Two.

Seth McGuire on Social Media and the Stock Market

Gnip’s Seth McGuire was on CNBC’s Squawk Box speaking to Andrew Ross Sorkin about social media and how investors can use data from social networks as part of their strategy. Gnip has been providing social data to the financial industry for more than a year, with clients including hedge funds, banks and signal/data providers.  Specifically, Seth spoke to how hedge funds and other traders are using social data as a variable in their algorithms, as well as a research product for deeper analysis of an equity.

Andrew and Seth also talked about how news frequently breaks on Twitter (the famous examples here is the death of Osama Bin Laden). This type of breaking news on StockTwits and Twitter provides a valuable signal that is frequently ahead of mainstream news. (As we’ve blogged before, natural disasters often are reported on Twitter before anywhere else.) Seth also talked about yesterday’s blog post from our data scientist, Scott Hendrickson, on JP Morgan’s $2 billion trading loss and how the news traveled through different social media publishers.

What Gnip has also seen is that while false stories might be shared on Twitter, Twitter is also quick to surpress the stories via crowdsourced response and questions as to the integrity of those false stories.

Squawk Box guest host Doug Dachille posed an interesting question on whether any of the financial regulators have reached out to use Gnip. While Gnip is serving government agencies in areas like disaster relief, right now it’s the actual compliance and data management departments at banks and funds who are more worried about social media. Most firms lock down the ability to post content on social networks, given SEC & FINRA restrictions, but when compliance officers walk the floor they see traders peeking at their iPhones or iPads to see breaking news and analysis on Twitter and StockTwits. From a compliance perspective, that’s dangerous…but they know the data is valuable so they’re seeking news ways (like Gnip) to bring that data in-house for controlled analysis.

Interested in learning more on social data and the stock market? Email info at gnip.com.

The Social Cocktail, Part 3: Many Publishers Build One Story

In the first post, we looked at high-level attributes of the social media publishers. Then, we spent time looking at the social media responses to expected and unexpected events. To end this series, let’s dive into an example of the evolution of a single story across a mix of publishers. This will provide some intuition into how the social cocktail works when examining a real-world event— in this case, the JPMorgan-Chase $2+ billion loss announcement on May 10, 2012.

JPMorgan-Chase Trading Loss

Twitter: Fast and Concise
On May 10, 2012, immediately after market closing, JPMorgan-Chase CEO Jamie Dimon held a shareholder call to announce a $2 billion trading loss. While traditional news agencies reported the call announcement late in the afternoon, Twitter led the way with reports from call participants who started tweeting while on the call a few minutes after it started.

To see how the volume on Twitter evolved, see figure 1. In each case, the points represent activity volumes on the topic of JPMorgan and “loss” while the lines represent function fits to either the Social Media Pulse or a Gaussian curve (a simple approximation for expected event traffic when averaging over the daily cycle.)

As Reuters and others released news stories and Europe started to wake up, a second Twitter pulse is visible. Toward the right-hand of the graph, the daily cycle of Tweets dominates the conversation about JPMorgan and “loss” with a curve more characteristic of broadly reported, expected events.Twitter Reacts to JPMorgan Trading Loss

Figure 1. Twitter and Stocktwits audience comment on JPMorgan and “loss” after the announcement of a $2B trading loss on the evening May 10, 2012. Volumes are normalized so that peak volume = 1 for each publisher.

StockTwits: Fast and Concise, Focused

Much of the analysis that applies to Twitter applies to StockTwits–the major exceptions are in the expertise of the users and focus of the content. The StockTwits service serves traders and participants are mostly professional investors. Because the audience and the content is curated, there is very little off-topic chatter.  Further, much of the content is specific analysis of JPMorgan’s loss, analysis of the stock price movement following the announcement and information about after-hours price indicators.
On Friday (May 11th), discussion of the loss reaches only about 40% of the peak of the night before. This is likely due to the message rapidly saturating the highly connected community on StockTwits.

Comments: Both Fast and Slow, Concise

Because there was a lot of financial news attention on the story, news stories started to appear soon after the call and these attracted comments immediately (this was the fast response). The data shown in Figure 2 includes both comments from Automattic and Disqus. These comment platforms are used for comments on both personal blogs and on news stories posted online by news organizations, so there is a mix of comments on news stories as well as personal analysis.A graph about comments on the JPMorgan trading loss

Figure 2.  Commenters on blogs and news stories react to the announcement of ta $2B trading loss on the evening 10 May 2012, and an even stronger contingent react early on 11 May. Volumes are normalized so that peak volume = 1.

More-considered news and blog stories appeared on May 11th, Friday morning and these spurred a second (slower) pulse of comment responses.

An additional pattern that is often seen in comments is that people tend to read blogs at certain times of day (e.g. morning or evening) by habit. Because of this, we sometimes see comment volumes spiking at the start or end of the day in very active timezones.

Tumblr: Medium and Very Rich

The Tumblr audience reacted to the news as if the story was broken on Tumblr rather than broken on traditional news. This is unique among the publishers studied here. This pattern of slowly growing traffic during the first few hours after the shareholder call may indicate the nature of the conversation on Tumblr. Rather than an event-response reaction such as twitter, or a considered reaction, as with blogs, the reaction of the audience on Tumblr accelerates as the type of content Tumblrs reblog appears in the network. While the initial posts on Tumblr refer to news stories, the spread of the story through reblogging happens as a ramp up to the peak over a few hours.

The following day, the Tumblr story evolves like an expected event.

Not only is the timeline unique, but Tumblr content is also unique. Early posts have rich media including political cartoons and more right-brained political commentary and humor than the text-comment crowd. Adding Tumblr to your social media mix may present additional challenges in evaluating and analyzing the content, but the sensibilities as well as the activity of this audience adds a dimension not found in the content from the other publishers.

Blogs: Medium and Rich

A few quick, factual reports from the call were published in the form of blog posts as can be seen by the slight “heaviness” in the curve at the end of the day (May 10th). However, the large majority of the blog traffic is the traditional, considered and refined reactions published throughout the following day. The traffic on May 11th follows the pattern of an event everyone already knows about.  The discussion here is analysis and commentary as people explore the implications of the story.

The large majority of the blog content is text or text with a picture of Mr. Dimon. Stories vary from dozens of words to a few thousand.Graph Showing Blog Reactions to JPMorgan Trading Loss

Figure 3.  Content-rich and text-rich reactions to the announcement of ta $2B trading loss on the evening 10 May 2012.  In-depth analysis continues with heavy posting during the day on the 11 May. Volumes are normalized so that peak volume = 1 for each publisher.

Finally, take a look at these timelines shown together in Figure 4.  This view gives a clear indication of the timing of reactions between the publishers.

Social media reaction to JPMorgan trading loss

 Figure 4. The points show the normalized volume of activities about “JPMorgan” and “loss” following the May 10th announcement from Jamie Dimon. Lines represent fits to models of typical social media reactions.  Volumes are normalized so that peak volume = 1 for each publisher.

Conclusion
This example story demonstrates the potential of mixing perspectives, audience and styles of conversation in creating a full description of the social media response to events. With the right mix, we can identify stories and emerging topics within minutes and we can quickly characterize the relative size and speed of a story. We can identify user engagement, dig into deeper analysis, and the rate and focus of content sharing. With this mix of social data, we might be getting close to the perfect cocktail.