Twitter Shouts: Huntsman's Out!

At Gnip, one of the most fascinating aspects of social media is ‘speed’ – specifically in regards to news stories. We continue to see a trend towards the ‘breaking’ of news stories on platforms like Twitter. Both the speed at which a story is broken as well as the speed at which that story catches on show the incredible power of this medium for information exchange. And as we’ve pointed out before, different social media streams offer different analytical value – Twitter versus a news feed for example.

Last night proved a great example of this as word of  Jon Huntsman’s withdrawal from the GOP presidential race crept out. Interestingly, the news was broken by Peter Hamby, a CNN Political Reporter–on Twitter. While CNN followed up on this news a few minutes later, it seems the reporter (or the network) realized the inherent ‘newswire’ value of breaking this news as fast as possible…and used Twitter as part of their strategy to do so!

This Tweet was followed with what we’ve begun to see as the normal ‘Twitter’ spike for breaking news – the chart below, built by our Data Scientist Scott, shows how quickly Huntsman withdrawl was retweeted and passed along. When looked at in comparison to an aggregate news feed (in this case, NewsGator’s Datawire Firehose, which is a content aggregator derived from crowdsourced rss feeds and contains many articles from traditional media providers), some interesting comparisons are brought to light.
Comparing the pulse of Twitter and NewsGator articles breaking Huntsman's withdrawal from the GOP primary race.
Comparing tweets of “huntsman” and news articles breaking Jon Huntsman’s withdrawal from GOP primary race. The blue curves show the “Social Activity Pulse” that characterizes the growth and decay of media activity around this topic. By fitting the rate of articles or tweets to a function we can compare standard measure such as time-to-peak, store half-life etc. (More on this in a future post.) The peak in Twitter is reached about the same time as the first story arrives from NewsGator, over 10 minutes after the story broke on Twitter.

Both streams show a similar curve in story adoption, peak and tail. What’s different is the timeframe of the content. Twitter’s data spikes about 10 minutes earlier than NewsGator’s. NewsGator’s content is more in-depth, as it contains news stories and blog posts, but as we’ve seen in other cases, Twitter is the place where news breaks these days.

 

Can Social Media Data Offset Market Volatility?

It’s been a volatile time for the markets the last few weeks. Today especially – the Dow closed down 635 points; S&P, -80; NASDAQ, -175. While there’s no shortage of opinions on how/why the market will/will not recover, one thing is for certain – having the right data to make decisions is more important than ever.

Part of the reason for this is that the markets are clamoring for trends – definitive information on stock trends and market sentiment. Which is why it’s exciting to see how our finance clients are using the Gnip realtime social media data feeds. In a time of increased volatility, our hedge fund (and other buy-side) clients are leveraging social media data as a new source of analysis and trend identification. With an ever-growing number of users, and Tweets, per day, Twitter is exploding, and market-leading funds are looking at the data we provide as a way to more accurately tap into the voice of the market. They’re looking at overall trend data from millions of Tweets to predict the sentiment of consumers as well as researching specific securities based on what’s being said about them online.

While the early-adopters of this data have been funds, this type of analysis is available to individuals as well. Check out some start-ups doing incredible things at the intersection of finance and social media:

  • Centigage provides analytics and intelligence designed to enable financial market participants to use social media in their investment decision-making process
  • SNTMNT offers an online tool that gives daily insights into online consumer sentiment surrounding 25 AEX funds and the index itself

For the first time in history, access to (literally) millions of voices is at our fingertips. As the market continues its volatility, those voices gain resonance as a source of pertinent information.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Guide to the Twitter API – Part 1 of 3: An Introduction to Twitter’s APIs

You may find yourself wondering . . . “What’s the best way to access the Twitter data I need?” Well the answer depends on the type and amount of data you are trying to access.  Given that there are multiple options, we have designed a three part series of blog posts that explain the differences between the coverage the general public can access and the coverage available through Twitter’s resyndication agreement with Gnip. Let’s dive in . .. 

Understanding Twitter’s Public APIs . . . You Mean There is More than One?

In fact, there are three Twitter APIs: the REST API, the Streaming API, and the Search API. Within the world of social media monitoring and social media analytics, we need to focus primarily on the latter two.

  1. Search API - The Twitter Search API is a dedicated API for running searches against the index of recent Tweets
  2. Streaming API – The Twitter Streaming API allows high-throughput, near-realtime access to various subsets of Twitter data (eg. 1% random sampling of Tweets, filtering for up to 400 keywords, etc.)

Whether you get your Twitter data from the Search API, the Streaming API, or through Gnip, only public statuses are available (and NOT protected Tweets). Additionally, before Tweets are made available to both of these APIs and Gnip, Twitter applies a quality filter to weed out spam.

So now that you have a general understanding of Twitter’s APIs . . . stay tuned for Part 2, where we will take a deeper dive into understanding Twitter’s Search API, coming next week…

 

Financial Markets in the Age of Social Media

When you think about it, the stock market is a pretty inspiring thing.

Over the past several centuries, humans have actually created an infrastructure that lets people put their money where their mouth is; an infrastructure that provides a mechanism for daily valuation of companies, currencies and commodities. It’s just unbelievable how far we’ve come and reflecting on the innovation that’s led us here brings to light a common but powerful denominator: Information.

  • When traders began gathering under a buttonwood tree at the foot of Wall Street in the late 1800’s, it was because proximity allowed them to gossip about companies.
  • When Charles Dow began averaging “peaks and flows” of multiple stocks in 1883, his ‘index’ became a new type of data with which to make decisions.
  • In 1975, when the sheer volume of paper necessary for trades became unmanageable, the SEC changed rules to permit electronic trading, allowing for an entirely new infrastructure.
  • And in the 1980’s, when Michael Bloomberg and his partners began building machines (the now ubiquitous Bloomberg Terminals), they tapped into an ever-growing need for more data.

Those are just some examples of the history that is exciting for us @Gnip, because of the powerful signal the market is sending us about social media. Here are some of the more recent signals we’ve seen:

  • The Bank of England announcing they were using Google search results as a means of informing their “nowcasts” detailing the state of the economy.
  • Derwent Capital Markets launching the first social-media based hedge fund this year.
  • The dedication of an entire panel to Social Media Hedge Fund Strategies at the Battle of the Quants conference in London last week.
  • Weekly news articles that describe how traders are using social data as a trading indicator (here’s one as an example).
  • Incorporation of social data into the algorithms of established hedge funds.

In other words, the market is tapping into a new and unique source of information as a means of making trading decisions. And the reason social media data is so exciting is because it offers an unparalleled view into the emotions, opinions and choices of millions of users. A stream of data this size, with this depth and range, has never really existed before in a format this immediate and accessible. And that access is changing how our clients analyze the world and make trades.

We’ve been privileged to see these use cases as we continue to serve a growing number of financial clients. Most exciting to us, as we respond to the market’s outreach for our services, is understanding our pivotal place in this innovation. As the premier source of legal, reliable and realtime data feeds from more than 30 sources of social media- including our exclusive agreement with Twitter- we’re at the center of how firms are integrating this data as an input. And that’s incredible stuff.

Are you in the financial market looking for a social media data provider? Contact us today to learn more! You can reach us at 888.777.7405 or by email.

30 Social Data Applications that Will Change Our World

Social media is popular — no surprise there. And as a result, there’s a huge amount of social media data in the world and every day the pool of data grows… not just a little bit, but enormously. For instance, just recently our partner Twitter blogged about their business growth and the numbers are staggering.

This social conversation data is valuable. Someday it will yield insights worth many millions, perhaps billions, of dollars for businesses. But the analyses and insights are only barely beginning to take shape. We hear from social media analytics companies every day and we see lots of interesting applications of this data. So… how can social media data be used? Here’s a partial list of social data applications that I believe will begin to take shape over the next decade or so:

  1. Product development direction
  2. Product feedback
  3. Customer service performance feedback
  4. Customer communications
  5. Stock market prediction
  6. Domestic/political mood analysis
  7. Societal trend analysis
  8. Offline marketing campaign impact measurement
  9. Word-of-mouth marketing campaign analysis
  10. URL virality analysis
  11. News virality analysis
  12. Domestic economic health indicator
  13. Linguistic analysis
  14. Educational achievement metric by time and locale
  15. Personal scheduling: see when your friends are busy
  16. Event planning: see when big events will happen in your community
  17. Online marketing
  18. Sales mapping & identification
  19. Consumer behavior analysis
  20. Internet safety implementation
  21. Counter-terrorism probabilistic analysis
  22. Disaster relief communication, mapping, and analysis
  23. Product development opportunity identification
  24. Competitive analysis
  25. Recruiting tools
  26. Connector, Maven, and Salesperson identification (to borrow Malcolm Gladwell’s terms)
  27. Cross-platform consumer alerting services
  28. Brand monitoring
  29. Business accountability ratings
  30. Product and service reviews

All of these projects can be built on public social media conversation data that’s legally and practically accessible. All of the necessary data is (or is on the roadmap to be) accessible via Gnip. But access to the data is only step one — the next step is building great algorithms and applications to draw insights from that data. We leave that part to our customers.

So, here’s to the analysts who are working with huge social data sets to bring social data analyses and insights to fruition and ultimately make the barrage of public data that surrounds us increasingly useful. Here at Gnip we’re grateful for your efforts and eager to find out what you learn.

How to Select a Social Media Data Provider

If you’re looking for social media data, you’ve got a lot of options: social media monitoring companies provide end-user brand tracking tools, some businesses provide deep-dive analyses of social data, other companies provide a reputation scores for individual users, and still other services specialize in geographic social media display, to name just a few. 

Some organizations ultimately decide to build internal tools for social media data analysis. Then they must decide between outsourcing the social data collection bit so they can focus their efforts on analyzing and visualizing the data, or building everything — including API connections to each individual publisher — internally. Establishing and maintaining those API connections over time can be costly. If your team has the money and resources to build your own social media integrations, then go for it!

But if you’re shopping for raw social media data, you should consider a social media API – that is, a single API that aggregates raw data from dozens of different social media publishers – instead of making connections to each one of those dozens of social media APIs individually. And in the social media API market, there is only a small handful of companies for you to choose from. We are one of them and we would love to work with you. But we know that you’ll probably want to shop your options before making a decision, so we’d like to offer our advice to help you understand some of the most important factors in selecting a social media API provider.

Here are some good questions for you to ask every social media API solution you consider (including your own internal engineers, if you’re considering hiring them for the job):

Are your data collection methods in compliance with all social media publishers’ terms of use?

–> Here’s why it matters: by working with a company that violates any publisher’s terms of use, you risk unstable (or sudden loss of) access to violated publisher’s data — not to mention the potential legal consequences of using black market data in your product. Conversely, if you work with a company that has a strong relationship with the social media publishers, our experience shows that you not only get stable, reliable data access, but you just might get rewarded with *extra* data access every now and then. (In case you’re wondering, Gnip’s methods are in compliance with each of our social media publishers’ terms of use.)

Do you provide results and allow parameter modifications via API, and do you maintain those API connections over time?

–> In our experience, establishing a single API connection to collect data from a single publisher isn’t hard. But! Establishing many API connections to various social media publishers and – this is key – maintaining those connections over time is really quite a chore. So much so, we made a whole long list of API-related difficulties associated with that integration work, based on our own experiences. Make sure that whoever you work with understands the ongoing work involved and is prepared to maintain your access to all of the social media APIs you care about over time.

How many data sources do you provide access to?

–> Even if you only want access to Twitter and Facebook today, it’s a good idea to think ahead. How much incremental work will be involved for you to integrate additional sources a few months down the line? Our own answer to this question is this: using Gnip’s social media API, once you’re set up to receive your first feed from Gnip via API, it takes about 1 minute for you to configure Gnip to send you data from a 2nd feed. Ten minutes later, you’re collecting data from 10 different feeds, all at no extra charge. Since you can configure Gnip to send all of your data in one format, you only need to create one parser and all the data you want gets streamed into your product. You can even start getting data from a new social media source, decide it’s not useful for your product, and replace it with a different feed from a different source, all in a matter of seconds. We’re pretty proud that we’ve made it so fast and simple for you to receive data from new sources… (blush)… and we hope you’ll find it to be useful, too.

What format is your data delivered in?

–> Ten different social media sources might provide data in 10 different formats. And that means you have to write 10 different parsers to get all the data into your product. Gnip allows you to normalize all the social media data you want into one single format — Activity Streams — so you can collect all your results via one API and feed them into your product with just one parser.

Hope this helps! If you’ve got additional questions to suggest for our list, don’t hesitate to drop us a note. We’d love to hear from you.

Marketing is from Mars, Business Intelligence is from… Betelgeuse?

Beetlejuice! John Battelle wrote a great post last week titled “What Marketers Want from Twitter Metrics” in which he recounts a conversation with Twitter COO Dick Costolo and lists some data he hopes we’ll soon see from Twitter.  These metrics include:

  • How many people *really* see a tweet.  Even though @gnipsupport has 150 followers, it’s unlikely that they all saw our tweet about this post.
  • Better information around engagement, such as retweets and co-incidence data.  There’s a classic VC saying: “the first time I hear about something I don’t notice; the second time, I take an interest and the third time I take action.”

For me, marketing is about sending a signal into the marketplace and then measuring how effectively it is received.  For instance, Gnip is trying to better engage with companies that use third-party APIs, and since we’re a startup, low cost matters.  One mechanism is this blog and the article you’re reading now.  That’s the “sending a signal” part.  While you’re reading this, I’m likely logged into Google Analytics, monitoring how people find this article, and watching Twitter to see if anyone mentions this post.  That’s the “measuring effectiveness” part.  And this isn’t a static, one-time cycle.  Based upon the feedback I get (some direct, some inferred), I’ll write and promote future posts a little differently.

I am positive that Twitter and other forms of social media will be hugely beneficial to marketing and the surrounding fields of sales, advertising and customer service. Highly measurable and disintermediated low-friction customer interactions with the marketplace is a wonderful thing.  However, if five years from now we’re still primarily talking about social media in terms of marketing, then an opportunity has been squandered.

If marketing is a company sending a signal to the marketplace and measuring how it is received, then business intelligence (from a product perspective) is the process of measuring and acting on the signal that the marketplace itself is sending.  For instance, last holiday season, a major discount chain wanted to know why, in the midst of the a recession, many of their traditional customers were opting to shop at more expensive competitors.  After examining Twitter, Facebook and other social services, they discovered that customers were unhappy with their stores’ lack of parking and cashiers.  Apparently, even in a financial crunch, convenience trumps price.  The store took steps to increase the number of cashiers and sales immediately increased.  THIS is where I’d like to see more emphasis in social media.

It’s a function of magnitude

With marketing, the product or service has already been created and success is now predicated on successfully engaging as many people as possible with your pitch.  The primary question is “How do we take this product and make it sound as appealing as possible to the market?”  Great marketing can create far greater demand than a shoddy one, but in the end, the product is fairly static by that point.  Sales is plotted on a continuum defined as “customer need multiplied by customer awareness,” where need is static and awareness if a variable.  What if you could change the scale of customer need?

When the product or service is still being defined, the size of the opportunity is extremely fluid.  A product that doesn’t address a customer need isn’t going to sell a ton, regardless of how well it’s marketed.  A product that addresses a massive customer need can still fail with poor marketing, but it will be a game changer with the right guidance.  Business intelligence is crucial to the process of identifying the biggest need in a market and building the appropriate solution.

Steve Ballmer is very vocal about how he only cares about ideas that will move his stock price a dollar.  But to move his stock price by even $0.10 at today’s P/E is to increase earnings (earnings!) by almost $100MM annually.  In other words, if you’re a startup whose product can’t generate a billion dollars, then it’s not worth Microsoft’s time to talk to you.  And if you’re a MS product manager who isn’t working on a billion dollar product, you might want to put in a transfer request.  Or better yet, listen to the market and retool what you’re currently building, because no amount of marketing is going to save you.

Yeah, “billion” with a “b”

Typically, entrepreneurs use personal experience and anecdotal evidence to design their offering.  Larger companies may conduct market research panels or send out surveys to better understand a market.  We are now blessed with the ability to directly interact with the marketplace at a scale never previously imagined.  The market is broadcasting desire and intent though a billion antennae every day, yet product managers are still casting a deaf ear.  Maybe we need better tools and data so that the business world to start tuning in.

First off, when you’re launching a product, you ought to know what the market looks like.  We need better access to user demographics, both at the service level (who uses Twitter) as well as the individual level (who just tweeted X).  A number of companies are starting to serve this need (folks like Klout, who offers reputation data for Twitter users, and Rapleaf, who offers social and demographic data based on email address) but there is a massive way to go.  I would kill for the ability to derive aggregated demographics — tell me about all the people who tweeted Y in the last year.

Secondly, access to historical data is critical.  When deciding whether to even begin planning a new product, it’s important to know whether the marketplace’s need is acute or a long-standing problem.  Right now, it’s nearly impossible to access data about something from before the moment you realize you should be tracking it.   This has led to all sorts of “data hoarding” as social media monitoring services attempt to squirrel away as much data as possible just in case they should need it in the future.  The world would be so much better with mature search interfaces.  Think about your average OLAP interface and then think about Facebook Search.  Twitter has already said that they are taking steps to increase the size of their search corpus; let’s make sure they know this is important and let’s encourage other social services to make historical data available as well.

One beeeeeeellion dollarsThe best part of all this is that Marketers and Product Managers need many of the same things — they’re in the same universe, you might say.  The best companies engage marketing as the product is being defined, and a result, a lot of these metrics will be benefit product managers and marketers alike.

Dell selling $6 million of computers on Twitter?  That’s pretty great.  Dell identifying a new $600M market because of signals sent on Twitter… that’s simply amazing.  And that’s the level of impact I hope to see social media have in the next few years.  Got your own ideas on how we can get there from here?  Post ‘em in the comments.

(Thanks to Brad Feld, Eric Norlin and Om Malik for helping me edit this post into something more readable and accurate.)

Gnip Pushed a New Platform Release This Week

We just pushed out a new release this week that includes new publishers and capabilities. Here is a summary of the release highlights. Enjoy!

  • New YouTube publisher: Do you need an easy way to access, filter and integrate YouTube content to your web application or website? Gnip now provides a YouTube publisher so go create some new filters and start integrating YouTube based content.
  • New Flickr publisher: Our first Flickr publisher had some issues with data consistency and could almost be described as broken. We built a brand new Flickr publisher to provide better access to content from Flickr. Creating filters is a snap so go grab some Flickr content.
  • Now publisher information can be shared across accounts: When multiple developers are using Gnip to integrate web APIs and feeds it sometimes is useful to see other filters as examples. Sharing allows a user to see publisher activity and statistics, but does grant the ability to edit or delete.
  • New Data Producer Analytics Dashboard: If your company is pushing content through Gnip we understand it is important to see how, where and who is accessing the content using our platform and with this release we have added a web-based data producer analytics dashboard. This is a beta feature, not where we want it yet, and we have some incomplete data issues. However, we wanted to get something available and then iterate based on feedback. If you are a data producer let us know how to take this forward. The current version provides access to the complete list of filters created against a publisher and the information can be downloaded in XML or CSV format

Also, we have a few things we are working on for upcoming releases:

  • Gnip Polling: Our new Flickr and YouTube publishers both leverage our new Gnip Polling service, which we have started using internally for access to content that is not available via our push infrastructure. We plan to make this feature available externally to customers in the future, so stay tuned or contact us if you want to learn more.
  • User generated publishers from RSS Feeds: We are going to open up the system so anyone can create new publishers from RSS Feeds. This new feature makes it easy to access, filter and integrate tons of web based content.
  • Field level mapping on RSS feeds: A lot of times the field naming of RSS feeds across different endpoints does not map to the way the field is named in your company. This new feature will allow the editing and mapping at the individual field level to support normalization across multiple feeds.
  • Filter rule batch updates: When your filters start to get big adding lots of new rules can be a challenge. Based on direct customer feedback it will soon be possible to batch upload filter rules.

Preview: Gnip Publisher Analytics

With everything going on here at Gnip we want to try and regularly preview some of the new features we are working on so people can send us feedback and plan ahead. One feature that we know a lot of people are interested in us delivering is usage and operational reporting and analytics. The reasons for adding an analytics dashboard are many and the primary reason is that we believe it will help companies and developers better understand the richness and variability of the data streams they care about.

Below is one example of the analytics features that we are planning to provide in the near future. This image shows the Digg Data Stream summary view with individual diggs, comments and submissions per second being streamed by the Gnip platform.

Figure: Gnip — Digg Data Stream Activity View

 

Obviously we could pivot on the summary view to show different types of details depending on any number of variables that Gnip partners and customers find interesting. If your company has specific requests for analytics and reporting please let us know.