Working Directly With the Twitter Data Ecosystem

One of the reasons Twitter acquired Gnip was because Twitter believes the best way to support the distribution of Twitter data is to have direct data relationships with its data customers – the companies building analytic solutions using Twitter’s data and platform. Direct relationships help Twitter develop a deeper understanding of customer needs, get direct feedback for the product roadmap, and work more closely with data customers to enable the best possible solutions for the brands that rely on Twitter data to make better decisions. At Twitter’s Analyst Day last November, Twitter’s VP of Data Strategy noted that when Twitter acquired Gnip, Gnip had the clear majority of Twitter’s data reseller business – the rest was held by the other two data resellers, DataSift and NTT Data. The acquisition of Gnip was the first step toward developing more direct relationships with data customers.

The next step in working directly with data customers is to transition everyone receiving raw data for commercial use from other data resellers to a direct relationship with Twitter. Twitter immediately started this transition process after acquiring Gnip last May, and we expect to finish the transition by the middle of August this year.

After that transition is completed, companies using raw Twitter data for commercial use – to build products, to analyze internally, and to serve other commercial purposes – will need to have a direct relationship with Twitter. For current Twitter partners and customers, it’s business as usual – they will continue to consume the same data they currently do from Twitter’s APIs. For customers who are still working on transitioning, that process will simply require you to begin consuming data via a relationship with Twitter instead of a reseller.

If you’re one of the companies still working on the transition from a data reseller to Twitter’s Commercial APIs through the Gnip product suite or to the Public API, we want to make sure you have the resources you need to successfully complete your transition over the next four months. Here are several channels you can turn to for help.

  1. Technical docs: You can find detailed technical documentation for all of our commercial products here.
  2. Technical webinars: We’ll be hosting technical webinars – covering the topics below – to help you transition to Twitter’s suite of APIs.
    1. Overview of Twitter’s commercial data platform
    2. Using Twitter’s real-time data products
    3. Using Twitter’s historical data products
    4. Tips and tricks to filter for the data you need
    5. Using Twitter’s Public APIs
  3. Office hours: We’ll hold weekly office hours with our product and support teams to answer specific customer questions.
  4. Transition team: We have a dedicated transition team available to answer your questions at

We’ll announce the dates and times for the webinars and office hours on this blog, so keep an eye on this space and follow @Gnip for the latest updates.

Twitter and IBM Partner to Transform Decision Making

I’m thrilled to announce that Twitter and IBM are partnering to transform how businesses and institutions understand their customers, markets and trends – and inform every business decision. For details, see our post on the Twitter blog and IBM’s press release.

Tweeting in the Rain, Part 4: Tweets during the 2013 Colorado flood

In August 2013, we posted two “Tweeting in the Rain” (Part 1 & Part 2) articles that explored important roles social data could play in flood early-warning systems. These two posts focused on determining whether there was a Twitter “signal” that correlated to local rain measurements. We looked at ten rain events from 2009-2012 in six different regions of the country, including San Diego, Las Vegas, Louisville and Boulder. That analysis demonstrated that even early in its history, the Twitter network had become an important broadcast channel during rain and flood events.

Around noon on Wednesday, September 11, 2013, we posted Part 3, which discussed the opportunities and challenges social networks provide to agencies responsible for early warning systems. As that day unfolded, the rainfall steadily intensified enough that it was becoming more clear that this weather event had the potential to become serious. By midnight, the Boulder County region was already in the midst of a flood event driven by a historic amount of rain. When the rain had tapered off 24 hours later, rain gauges in the Boulder area had recorded 12-17 inches. This happened in an area that expects around 20 inches per year on average.

On the evening of September 11, we stayed up late watching the flood and its aftermath unfold on Twitter, 140 characters at a time. As written about here, we witnessed Twitter being used in a variety of ways. Two key opportunities that Twitter provided during the event were:

1. The ability for the public to share photos and videos in real-time.

2. A medium for local emergency and weather agencies to broadcast critical information.

As we approached the one-year anniversary of the flood, we wanted to revisit the “Tweeting in the Rain” blog research and take a similar look at the 2013 flood with respect to the Twitter network. For this round, we wanted to investigate the following questions:

  • How would the Twitter signal compare to these historic rain measurements?
  • How would the Twitter signal compare to river levels?
  • As the event unfolded, did the Twitter audience of our public safety agencies grow? How did official flood updates get shared across the network?

With these questions in mind, we began the process of collecting Tweets about the flood, obtained local rain and water level data, and started building a relational database to host the data for analysis. (Stay tuned over at for a series of articles on building the database schema in support of this research.)

A flood of Tweets

Below are some selected Tweets that illustrate how the 2013 Colorado Flood unfolded on Twitter. A year later, these messages help remind us of the drama and crisis severity that occurred throughout the region.

Earlier in the day, weather followers likely saw the early signs of above-average amounts of moisture in the area:

By that night, all local public safety agencies ramped up to manage a regional natural disaster:

At 10:02 p.m. MT, the Boulder County Office of Emergency Management (@BoulderEOM) posted the following Tweet:

As we approached midnight, this flood event was getting really scary:

A unique role that Twitter and its users played throughout the flood event was the real-time feed of photos and videos from across the region:

By Friday, September 13, the historic amounts of rainfall had affected a wide area of Colorado. In foothill communities like Jamestown and Lyons, the immediate danger were torrential flash floods that scoured through the town centers.

Further downstream the primary problem was steadily rising waters that pooled in the area for days. Contributing to this were several earthen dams that failed, adding their reservoir contents to the already overloaded creeks and rivers.

Compiling ‘flood’ Tweets

As part of the previous round of analysis, we looked at a 2011 summer thunderstorm that dumped almost two inches of rain on the Boulder area in less than an hour. This intense rainfall was especially concerning because it was centered on a forest fire burn area up Fourmile Creek. Flash flood warnings were issued and sirens along Boulder Creek in central Boulder were activated to warn citizens of possible danger.

For that analysis, we collected geo-referenced Tweets containing keywords related to rain and storms (see here for more information on how these filters were designed). During the 48-hours around that event, there were 1,620 Tweets posted from 770 accounts. Here is how that event’s rain correlated with those Tweets.

For this round of analysis, we added a few more types of filters:

  • Hashtags: As the 2013 Colorado flood unfolded, hashtags associated with the event came to life. The most common ones included #ColoradoFlood, #BoulderFlood, #LongmontFlood, and well as references to our local creeks and rivers with #BoulderCreek, #LefthandCreek and #StVrainRiver.
  • Our Profile Geo enrichment had been introduced since the last round of analysis. Instead of needing to parse profile locations ourselves, we were able to let Gnip’s enrichment do the parsing and build simple rules that matched Tweets coming from Colorado-based accounts.
  • Local agencies and media: Since this was such a significant regional event, we collected Tweets for local public agencies and local media accounts.

We applied these filters to six months of data – from August 10, 2013 to February 10, 2014 – beginning with a period that started before the flood to establish the ‘baseline’ level of postings.

Between September 1-7, 2013, there were less than 8,800 Tweets, from 4,900 accounts, matching our filters. During the first week of the flood, September 10-16, we found over 237,000 Tweets from nearly 63,000 Twitter accounts. (And in the following five months of recovery, there were nearly another 300,000 Tweets from 45,000 more accounts).

Comparing Twitter signals with weather data

As before, we wanted to compare the Twitter signal with a local rain gauge. We again turned to OneRain for local rain and stage data recorded during the event.  (OneRain maintains critical early-warning equipment in the Boulder and Denver metropolitan areas, including the foothills in that region). This time we also wanted to compare the Twitter signal to local river levels. Figure 1 represents hourly rainfall (at the Boulder Justice Center) and maximum Boulder Creek water levels (at Broadway St.) along with hourly number of ‘flood’ Tweets.

Boulder Flood Tweets
Figure 1 – Hourly rainfall, Boulder Creek Levels and Tweets during the Colorado Flood 2013, September 10-17. Tweets matching the flood filters during this period equals over 237,000 Tweets. Those same filters matched less than 8,800 during the September 1-8 “baseline” period.

Twitter users finding information when it is most needed

You can see from the information above that our local public agencies played a critical role during the 2013 Colorado flood. Between September 10-17, the Boulder County Office of Emergency Management (@BoulderOEM) and the Boulder National Weather Service office (@NWSBoulder) posted a combined 431 Tweets. These Tweets included updates on current weather and flash flood conditions, information for those needing shelter and evacuation and details on the state of our regional infrastructure. These Tweets were also shared (Retweeted) over 8,600 times by over 4,300 accounts. The total amount of followers of the Twitter accounts that shared these Tweets was more than 9.5 million.

Twitter offers users the ability to actively update the accounts they want to follow. Knowing this, we assumed that the number of followers of these two local agencies would grow during the flood. To examine that type of Twitter signal, we compared the hourly data new followers and rain accumulation at the Boulder Justice Center. The results of that comparison are shown in Figure 2. These two agencies gained over 5,600 new followers, more than doubling their amount during September 10-16.

Figure 2: Boulder Flood Tweets
Figure 2 – Comparing new followers of @BoulderOEM and @NWSBoulder with rain accumulation. Rain was measured at Boulder Justice Center in central Boulder.

One interesting finding in Figure 2 is there seems to be a threshold of accumulated rainfall at which point Twitter users turn their attention to local agencies broadcasting about the flood. In this case it was around midnight on September 11, after five inches of rain and the start of local flooding. As the event worsened and it became more and more difficult to move around the region, more Twitter users tuned directly into the broadcasts from their local Office of Emergency Management and National Weather Service Twitter accounts.

Even as the region shifted its attention to flood recovery, the information being shared on Twitter was vital to the community. Just as the Twitter network was used in a variety of ways during the flood, it provided a critical broadcast channel as communities grappled with widespread damage. The major themes of Tweets posted immediately after the flood included:

  • Information about the evacuated communities of Jamestown, Lyons and Longmont.
  • Details on shelters and other support mechanisms for displaced residents.
  • Organization of volunteers for cleanup activities.
  • Promotion of charitable organization funds.
  • Regional infrastructure conditions and updates. This article discusses how Tweets helped identify road and bridge damages in closed-off areas.

Based on all of this data, it’s very clear that the Twitter network played an important role during and after the 2013 Colorado flood. The combination of real-time eye-witness accounts and updates from our public agencies made Twitter a go-to source for critical emergency information.

In recognition of this important role, Twitter has introduced Twitter Alerts. This service provides the ability for Twitter users to sign up for mobile push notifications from their local public safety agencies. For any public agency with a mission of providing early-warning alerts, this service can help the public find the information they need during emergencies and natural disasters.

Continue reading

Historical PowerTrack Requests, Now Faster Than Ever

The Twitter Data Product Team is excited to share an update with you around recent enhancements to our Historical PowerTrack offering. In an effort to improve our customer experience for historical data requests, we’ve made substantial technology investments to reduce processing times as well as to support future adoption and usage patterns.

Historical data jobs have always been processed as fast as our infrastructure allowed, and now they are significantly faster than ever before. You may be asking yourself, “So just what does this mean for my business?” Well, here are some data points that should help to put these improvements in perspective:

  • A 1-year historical data job that previously took in the neighborhood of 144 hours to complete was recently processed in just 5 hours.
  • A 2-year historical data job that previously took nearly 288 hours to complete was recently processed in just 8 hours.

We no longer recommend breaking historical jobs down into smaller pieces to process the data faster, thereby affording our customers one more level of improved efficiency.  Your historical jobs will now actually process faster if they remain intact as a singular request and this should make the job management process easier for you and your team.

One final area of improvement benefiting your business is our more accurate estimates around job processing times. While these predictions will certainly remain just that, “estimates”, the deviation from the eventual processing times will be greatly reduced in most cases.

If you have any further questions around these improvements, feel free to reach out to our Product Team at

The Gnip Usage API: A New Tool for Monitoring Data Consumption

At Gnip we know that reliable, sustainable, and complete data delivery products are core to enabling our customers to surface insights and value from social data. Today we’re excited to announce a new API that will make it easier for our customers to monitor and manage the volume of Gnip data they consume – the Usage API.

The Usage API is a free service that allows Gnip customers to send requests for account consumption statistics across our various data sources. This API allows for even more granular visibility into usage and allows for new automated monitoring and alerting apps. Customers now have a programmatic way for understanding usage trends in addition to the monthly and daily usage reports already available in the Gnip Console.

Customer usage data is shown in aggregate and broken down by data source and product type to provide a narrow lens for studying consumption levels. There are now numerous activity update intervals throughout the day and, where applicable, monthly usage projections are also provided for insight into end-of-month usage statistics. The Usage API also includes consumption thresholds for each account to enable customers to keep track of maximum anticipated consumption levels.

The Usage API is available for use today. To learn more about the Usage API or to find instructions for getting started, please reference our support documentation.

The Power of Command Centers

The ability to integrate enterprise data alongside social data and visualize the output in one place is a powerful one and one that brands are leveraging through the use of command centers. With this tool not only can brands combine internal data with social, but multiple business units can see the data all at once, ensuring efficient workflow. Command centers also help take social data out of traditional silos and make it more dynamic.

A command center is just one of the tools that Gnip customer and new Plugged In partner MutualMind offers customers using social data. The MutualMind platform also helps customers listen, gauge sentiment, track competitors, identify influencers, engage with audiences, as well as having full-service white label and OEM capabilities. We asked MutualMind to share an example of how customers leverage their offerings.


American Airlines also uses the MutualMind Command Center to give their social media team a “30,000-ft view” of what’s being said by travellers, employees and many other stakeholders. Teams across the company are able to quickly see the impact of Twitter and other real-time data streams, in addition to identifying relevant trends for American Airlines and view the competitive landscape. The Command Center enables the American Airlines team to collaborate and coordinate their response, particularly in moments of crisis or brand initiatives.

We love highlighting the ways our customers make life easier for their clients and we’re excited to add MutualMind to the partner program!

Smoke vs. Smoke: DiscoverText helps public health researchers

These days manually sorting data isn’t an option. The ability to easily and accurately classify and search Twitter data can save valuable time, whether for academic research or brand marketing analysis. That’s why we’re excited to add Texifter as a Plugged In partner. Texifter’s SaaS and cloud-based text analytics tools help companies and researchers sort and analyze large amounts of unstructured content, from customer surveys to social media data.

Research groups such as the Health Media Collaboratory in Chicago used DiscoverText to help them identify and analyze the role of social media data in public health — specifically social media reactions to anti-smoking campaigns. Not shockingly, the word “smoke” appears in millions of Tweets in many different contexts (smoky fire, smoke pot, smoke screen, etc.). In this case, the research team was specifically looking for Tweets related to cigarette smoking and tobacco usage. Using the DiscoverText tool, the team could surface only the Tweets relevant to their research. The collaborative, cloud-based nature of DiscoverText facilitated joint research and the easy incorporation of large amounts of different types of data.

We believe that social data has limitless application — and we’re always keen to share the products that prove the point.

Texifter Plugged In to Gnip from Stuart Shulman on Vimeo.

Leveraging the Search API

Brands these days are savvy about comprehensively tracking keywords, competitors, hashtags, and so on. But there will always be unanticipated events or news stories that pop up. The keywords associated with these events are rarely ever tracked in advance. So what’s a brand to do?

Our newest Plugged In partner, Simply Measured (@simplymeasured), was one of our first customers to leverage instant access to historical Twitter data using Gnip’s Search API. When those surprise events affect their customers, Simply Measured can quickly (within hours) retrieve customers’ Twitter data from the last 30 days. The Search API lets them create complex rules so the data they deliver to customers is zeroed in on the right Tweets. The ability to quickly access this data lets customers develop PR strategies and responses in a timely fashion.

Learn more about how Simply Measured has incorporated the historical search tool, and how it helped one of their customers.


Hacking to Improve Disaster Response with Qlik, Medair and Gnip

At Gnip, we’re always excited to hear about groups and individuals who are using social data in unique ways to improve our world. We were recently fortunate enough to support this use of social data for humanitarian good first-hand. Along with Plugged In to Gnip partner, Qlik, and international relief organization, Medair, we hosted a hackathon focused on global disaster response.

The hackathon took place during Qlik’s annual partner conference in Orlando and studied social content from last year’s Typhoon Haiyan. Historical Twitter data from Gnip was paired with financial information from Medair to give participants the opportunity to create new analytic tools on Qlik’s QlikView.Next BI platform. The Twitter data set specifically included Tweets from users in the Philippines for the two week period around Typhoon Haiyan in November of 2013. The unique combination of data and platform allowed the hackathon developers to dissect and visualize a massive social data set with the goal of uncovering new insights that could be applied in future natural disasters.

For example, one team used Gnip’s Profile Geo Enrichment to map Tweets from highly-specific areas according to keywords such as “water”, “food” or “shelter”. Identifying trends in which geographic areas have greater needs for certain types of aid could provide a model for improving disaster response times and efficiencies. Another team analyzed spikes in the use of certain hashtags as a way to uncover actionable data being shared about the residual impacts of the typhoon. The developers’ efforts were all brought to life through the QlikView.Next visualization platform, making the resulting insight discovery process very intuitive and easy to comprehend. The results were pretty amazing, and here’s a look at the winning app!


“With Gnip’s support, we were extremely honored to be able to work with Medair for this year’s Qlik Hackathon and help them use data to further the impact of the great work they are doing worldwide,” said Peter McQuade, vice president of Corporate Social Responsibility at Qlik. “It provides Medair with an application that supports their fundraising efforts and ultimately helps change our world by maximizing the impact of their work with some of the world’s most vulnerable people.”

We would like to express our sincere thanks to both Medair and Qlik for inviting us to participate in such a meaningful cause. The hackathon produced new social data applications that Medair and first responder teams may be able to use in future disaster response efforts to better help those immediately affected. As for Gnip, we can’t wait to see how social data will be applied toward other humanitarian causes moving forward!

Streaming Data Just Got Easier: Announcing Gnip’s New Connector for Amazon Kinesis

I’m happy to announce a new solution we’ve built to make it simple to get massive amounts of social data into the AWS cloud environment. I’m here in London for the AWS Summit where Stephen E. Schmidt, Vice President of Amazon Web Services, just announced that Gnip’s new Kinesis Connector is available as a free AMI starting today in the AWS Marketplace. This new application takes care of ingesting streaming social data from Gnip into Amazon Kinesis. Spinning up a new instance of the Gnip Kinesis Connector takes about five minutes, and once you’re done, you can focus on writing your own applications that make use of social data instead of spending time writing code to consume it.




Amazon Kinesis is AWS’s managed service for processing streaming data. It has its own client libraries that enable developers to build streaming data processing applications and get data into AWS services like Amazon DynamoDB, Amazon S3 and Amazon Redshift for use in analytics and business intelligence applications. You can read an in-depth description of Amazon Kinesis and its benefits on the AWS blog.

We were excited when Amazon Kinesis launched last November because it helps solve key challenges that we know our customers face. At Gnip, we understand the challenges of streaming massive amounts of data much better than most. Some of the biggest hurdles – especially for high-volume streams – include maintaining a consistent connection, recovering data after a dropped connection, and keeping up with reading from a stream during large spikes of inbound data. The combination of Gnip’s Kinesis Connector and Amazon Kinesis provides a “best practice” solution for social data integration with Gnip’s streaming APIs that helps address all of these hurdles.

Gnip’s Kinesis Connector and the high-availability Amazon AWS environment provide a seamless “out-of-the-box” solution to maintain full fidelity data without worrying about HTTP streaming connections. If and when connections do drop (it’s impossible to maintain an HTTP streaming connection forever), Gnip’s Kinesis Connector automatically reconnects as quickly as possible and uses Gnip’s Backfill feature to ingest data you would have otherwise missed. And due to the durable nature of data in Amazon Kinesis, you can pick right back up where you left off reading from Amazon Kinesis if your consumer application needs to restart.

In addition to these features, one of the biggest benefits of Amazon Kinesis is its low cost. To give you a sense for what that low cost looks like, a Twitter Decahose stream delivers about 50MM messages in a day. Between Amazon Kinesis shard costs and HTTP PUT costs, it would cost about $2.12 per day to put all this data into Amazon Kinesis (plus Amazon EC2 costs for the instance).

Gnip’s Kinesis Connector is ready to use starting today for any Twitter PowerTrack or Decahose stream. We’re excited about the many new, different applications this will make possible for our customers. We hope you’ll take it for a test drive and share feedback with us about how it helps you and your business do more with social data.

Gnip and Amazon AWS