Guide to the Twitter API – Part 3 of 3: An Overview of Twitter’s Streaming API

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

Financial Markets in the Age of Social Media

When you think about it, the stock market is a pretty inspiring thing.

Over the past several centuries, humans have actually created an infrastructure that lets people put their money where their mouth is; an infrastructure that provides a mechanism for daily valuation of companies, currencies and commodities. It’s just unbelievable how far we’ve come and reflecting on the innovation that’s led us here brings to light a common but powerful denominator: Information.

  • When traders began gathering under a buttonwood tree at the foot of Wall Street in the late 1800′s, it was because proximity allowed them to gossip about companies.
  • When Charles Dow began averaging “peaks and flows” of multiple stocks in 1883, his ‘index’ became a new type of data with which to make decisions.
  • In 1975, when the sheer volume of paper necessary for trades became unmanageable, the SEC changed rules to permit electronic trading, allowing for an entirely new infrastructure.
  • And in the 1980′s, when Michael Bloomberg and his partners began building machines (the now ubiquitous Bloomberg Terminals), they tapped into an ever-growing need for more data.

Those are just some examples of the history that is exciting for us @Gnip, because of the powerful signal the market is sending us about social media. Here are some of the more recent signals we’ve seen:

  • The Bank of England announcing they were using Google search results as a means of informing their “nowcasts” detailing the state of the economy.
  • Derwent Capital Markets launching the first social-media based hedge fund this year.
  • The dedication of an entire panel to Social Media Hedge Fund Strategies at the Battle of the Quants conference in London last week.
  • Weekly news articles that describe how traders are using social data as a trading indicator (here’s one as an example).
  • Incorporation of social data into the algorithms of established hedge funds.

In other words, the market is tapping into a new and unique source of information as a means of making trading decisions. And the reason social media data is so exciting is because it offers an unparalleled view into the emotions, opinions and choices of millions of users. A stream of data this size, with this depth and range, has never really existed before in a format this immediate and accessible. And that access is changing how our clients analyze the world and make trades.

We’ve been privileged to see these use cases as we continue to serve a growing number of financial clients. Most exciting to us, as we respond to the market’s outreach for our services, is understanding our pivotal place in this innovation. As the premier source of legal, reliable and realtime data feeds from more than 30 sources of social media- including our exclusive agreement with Twitter- we’re at the center of how firms are integrating this data as an input. And that’s incredible stuff.

Are you in the financial market looking for a social media data provider? Contact us today to learn more! You can reach us at 888.777.7405 or by email.

25 Million Free Tweets on Power Track

Last week we announced Twitter firehose filtering. This week we’re celebrating the news with Free Tweets for all. Sign up by February 28th to enjoy no licensing fees on your first 25 million Tweets in your first 60 days using Power Track. 

Power Track offers powerful filtering of the Twitter firehose, guaranteeing 100% Tweet delivery. For instance, filter by keyword or username to access all Tweets that match the criteria you care about and have all of the matching results delivered to you in realtime via API. Power Track supports Boolean operators, can match your filtering criteria even within expanded URLs, and has no query volume or traffic limitations, helping you access all of the data you want. And it’s only available from Gnip, currently the only authorized distributor of Twitter data via API.

The licensing fee for Power Track is $.10 per 1,000 Tweets, but we’re waiving that fee for the first 25 million Tweets in 60 days for Power Track customers who sign up by February 28th. 1-year agreement and Gnip data collector fee still required.

Learn More or Contact Us to start testing Power Track for firehose filtering. Cheers!

Gov 2.0 & Social Media

Gnip’s doing great in the SMM (Social Media Monitoring) marketplace. However, we want more. We attended the Gov 2.0 Expo a few months ago, and we’ll also be at the upcoming Gov 2.0 Summit in Sept. Watching markets evolve their understanding of new technologies, concepts and solutions is always fascinating. The world of government projects, technologies, contracts, and vendors, is vastly different from the world we tend to work in day-to-day. Adoption and understanding takes a lot longer than what those of us more in the “web space” are used to, and policy often has significant impact on how/when something can be incorporated. Yet, there is an incredible market opportunity in front of social media related firms.

Government spending is obviously a tremendous force, and while sales/adoption cycles are long, it needs to be tapped. Thankfully, government agency awareness around social media is rising. From technology stack understanding, to communication paradigm shifts (e.g. Twitter & Facebook), gov. firms and teams are realizing the need for integration and use. Whether it’s the Defense Department’s need to apply predictive algorithms to new communication streams, or disaster recovery organizations needing to tap into crowd sourcing when catastrophe strikes, a vast array of teams are engaging at an increasing rate. A friend of mine lit up a room at the recent Emergency American Red Cross Summit, when he showed them how communication (messaging and photos) can be mashed-up onto a map, in real-time (via Gnip btw); highly relevant when considering disaster situations. “Who’s there?” “What’s the situation?” are questions easily answered when social data streams are tapped and blended.

The social media echo chamber we live in is broadening to include significant government agencies, and the fruits that are falling from today’s social applications are landing in good places. I’m looking forward to participating in the burgeoning conversation around social media and government’s digestion of it. I encourage you to dive in as well, though be prepared for a relatively slow pace. Don’t expect the same turnaround times we’ve become accustomed to, rather, consider back-grounding some time in the space, and consider it an investment with a longer term payoff.

How to Select a Social Media Data Provider

If you’re looking for social media data, you’ve got a lot of options: social media monitoring companies provide end-user brand tracking tools, some businesses provide deep-dive analyses of social data, other companies provide a reputation scores for individual users, and still other services specialize in geographic social media display, to name just a few. 

Some organizations ultimately decide to build internal tools for social media data analysis. Then they must decide between outsourcing the social data collection bit so they can focus their efforts on analyzing and visualizing the data, or building everything — including API connections to each individual publisher — internally. Establishing and maintaining those API connections over time can be costly. If your team has the money and resources to build your own social media integrations, then go for it!

But if you’re shopping for raw social media data, you should consider a social media API – that is, a single API that aggregates raw data from dozens of different social media publishers – instead of making connections to each one of those dozens of social media APIs individually. And in the social media API market, there is only a small handful of companies for you to choose from. We are one of them and we would love to work with you. But we know that you’ll probably want to shop your options before making a decision, so we’d like to offer our advice to help you understand some of the most important factors in selecting a social media API provider.

Here are some good questions for you to ask every social media API solution you consider (including your own internal engineers, if you’re considering hiring them for the job):

Are your data collection methods in compliance with all social media publishers’ terms of use?

–> Here’s why it matters: by working with a company that violates any publisher’s terms of use, you risk unstable (or sudden loss of) access to violated publisher’s data — not to mention the potential legal consequences of using black market data in your product. Conversely, if you work with a company that has a strong relationship with the social media publishers, our experience shows that you not only get stable, reliable data access, but you just might get rewarded with *extra* data access every now and then. (In case you’re wondering, Gnip’s methods are in compliance with each of our social media publishers’ terms of use.)

Do you provide results and allow parameter modifications via API, and do you maintain those API connections over time?

–> In our experience, establishing a single API connection to collect data from a single publisher isn’t hard. But! Establishing many API connections to various social media publishers and – this is key – maintaining those connections over time is really quite a chore. So much so, we made a whole long list of API-related difficulties associated with that integration work, based on our own experiences. Make sure that whoever you work with understands the ongoing work involved and is prepared to maintain your access to all of the social media APIs you care about over time.

How many data sources do you provide access to?

–> Even if you only want access to Twitter and Facebook today, it’s a good idea to think ahead. How much incremental work will be involved for you to integrate additional sources a few months down the line? Our own answer to this question is this: using Gnip’s social media API, once you’re set up to receive your first feed from Gnip via API, it takes about 1 minute for you to configure Gnip to send you data from a 2nd feed. Ten minutes later, you’re collecting data from 10 different feeds, all at no extra charge. Since you can configure Gnip to send all of your data in one format, you only need to create one parser and all the data you want gets streamed into your product. You can even start getting data from a new social media source, decide it’s not useful for your product, and replace it with a different feed from a different source, all in a matter of seconds. We’re pretty proud that we’ve made it so fast and simple for you to receive data from new sources… (blush)… and we hope you’ll find it to be useful, too.

What format is your data delivered in?

–> Ten different social media sources might provide data in 10 different formats. And that means you have to write 10 different parsers to get all the data into your product. Gnip allows you to normalize all the social media data you want into one single format — Activity Streams — so you can collect all your results via one API and feed them into your product with just one parser.

Hope this helps! If you’ve got additional questions to suggest for our list, don’t hesitate to drop us a note. We’d love to hear from you.

Worried About Twitter's Move to OAuth?

Have you scheduled your engineering work to move to OAuth for Twitter’s interfaces? If you’re a Gnip user, you don’t have to! Gnip users don’t have to know anything about OAuth’s implementation in order to keep their data flowing, and for all of them, that’s a huge relief. OAuth is non-trivial to setup and support, and is an API authentication/authorization mechanism that most data consumers shouldn’t have to worry about. That’s where Gnip steps in! One of our value-adds is that many API integrations shifts, like this one, are hidden from our customers. You can merrily consume data without having to sink expensive resources into adapting to the constantly shifting sands of data provider APIs.

If you’re consuming data via Gnip, when Twitter makes the switch to affected APIs, all you’ll need to do is provide Gnip with your OAuth tokens (the new “username” and “password”; just more secure and controllable), and off you go! You don’t have to worry about query param ordering, hashing, signing, and associated handshakes.

Real-Time Event Notification via Gnip

pubsubhubbub and rssCloud are helping shed light on the technical solution to real-time event propagation; HTTP POST (aka webhooks). As a friendly reminder, if you’re building pubsubhubbub and/or rssCloud into your app as a publisher/author, you should also consider pushing to Gnip as well. While Gnip, pubsububbub and rssCloud are providing sound technical solutions to a huge problem, Gnip’s widespread adoption (thousands of existing subscribers) can get your events in front of a consumer-base that Gnip has spent over a year cultivating. With very little integration work on your part (heck, we have a half-dozen convenience libs already built for you to use; pick your language), you can get your data out to a wide-audience of existing Gnip subscribers.

Gnip License Changes this Friday, Aug 28th

As we posted last month there are some changes coming to the way we license use of the Gnip platform.  See: Gnip Licensing Changes Coming in August.

These updates will be put in place this Friday, August 28th.    The impact of the new licensing will be the following for our existing users.

  1. The Gnip Community Edition license will be disabled as it is no longer being offered.   Accounts that were created before August 1st will be set to inactive and no longer will be able to access the Gnip API or Developer Website.   If your company is in the process of evaluating Gnip for a commercial project and needs a longer amount of time to complete your project please contact us at info@gnip.com and we can extend your account on a longer trial.
  2. Gnip Standard Edition user accounts using the Commercial, Non-profit and Startup partner license options will continue to be available as they are not impacted by the change on Friday.   If you are a standard edition user and we accidentally your disable your account on Friday please contact us at info@gnip.com and we will reactivate the account.
  3. New users who created an account starting after August 1st will receive an email notification on the day their 30-day trial expires informing them they need to contact Gnip to obtain the appropriate license for their commercial, non-profit or partner use case.

We appreciate all the companies and developers who have built solutions using Gnip and look forward to continuing to deliver real-time data to power these solutions.   By making these adjustments in our licensing we will be able to focus on innovating the Gnip Platform and supporting the many companies and partners we are fortunate to work with every day.