I’m thrilled to announce that Twitter and IBM are partnering to transform how businesses and institutions understand their customers, markets and trends – and inform every business decision. For details, see our post on the Twitter blog and IBM’s press release.
In August 2013, we posted two “Tweeting in the Rain” (Part 1 & Part 2) articles that explored important roles social data could play in flood early-warning systems. These two posts focused on determining whether there was a Twitter “signal” that correlated to local rain measurements. We looked at ten rain events from 2009-2012 in six different regions of the country, including San Diego, Las Vegas, Louisville and Boulder. That analysis demonstrated that even early in its history, the Twitter network had become an important broadcast channel during rain and flood events.
Around noon on Wednesday, September 11, 2013, we posted Part 3, which discussed the opportunities and challenges social networks provide to agencies responsible for early warning systems. As that day unfolded, the rainfall steadily intensified enough that it was becoming more clear that this weather event had the potential to become serious. By midnight, the Boulder County region was already in the midst of a flood event driven by a historic amount of rain. When the rain had tapered off 24 hours later, rain gauges in the Boulder area had recorded 12-17 inches. This happened in an area that expects around 20 inches per year on average.
On the evening of September 11, we stayed up late watching the flood and its aftermath unfold on Twitter, 140 characters at a time. As written about here, we witnessed Twitter being used in a variety of ways. Two key opportunities that Twitter provided during the event were:
1. The ability for the public to share photos and videos in real-time.
2. A medium for local emergency and weather agencies to broadcast critical information.
As we approached the one-year anniversary of the flood, we wanted to revisit the “Tweeting in the Rain” blog research and take a similar look at the 2013 flood with respect to the Twitter network. For this round, we wanted to investigate the following questions:
- How would the Twitter signal compare to these historic rain measurements?
- How would the Twitter signal compare to river levels?
- As the event unfolded, did the Twitter audience of our public safety agencies grow? How did official flood updates get shared across the network?
With these questions in mind, we began the process of collecting Tweets about the flood, obtained local rain and water level data, and started building a relational database to host the data for analysis. (Stay tuned over at dev.twitter.com for a series of articles on building the database schema in support of this research.)
A flood of Tweets
Below are some selected Tweets that illustrate how the 2013 Colorado Flood unfolded on Twitter. A year later, these messages help remind us of the drama and crisis severity that occurred throughout the region.
Earlier in the day, weather followers likely saw the early signs of above-average amounts of moisture in the area:
Very Heavy rain moving NNW through Platteville & Johnstown areas in Weld county, Expect 1″ in 20 minutes & flooded streets/intersecs. #COwx
— NWS Boulder (@NWSBoulder) September 11, 2013
By that night, all local public safety agencies ramped up to manage a regional natural disaster:
At 10:02 p.m. MT, the Boulder County Office of Emergency Management (@BoulderEOM) posted the following Tweet:
Sirens along #Boulder Creek have been activated. Lots of debris in the creek. Do not cross flood waters on foot or in a vehicle.
— Boulder OEM (@BoulderOEM) September 12, 2013
As we approached midnight, this flood event was getting really scary:
Situation in Fourmile Canyon worsening. CLIMB TO HIGH GROUND IMMEDIATELY. — Boulder OEM (@BoulderOEM) September 12, 2013
A unique role that Twitter and its users played throughout the flood event was the real-time feed of photos and videos from across the region:
— K. McDonald (@BigPictureAg) September 12, 2013
By Friday, September 13, the historic amounts of rainfall had affected a wide area of Colorado. In foothill communities like Jamestown and Lyons, the immediate danger were torrential flash floods that scoured through the town centers.
295 people being airlifted out of Jamestown, first group now on buses en route to evacuation center. #boulderflood
— Boulder OEM (@BoulderOEM) September 13, 2013
Further downstream the primary problem was steadily rising waters that pooled in the area for days. Contributing to this were several earthen dams that failed, adding their reservoir contents to the already overloaded creeks and rivers.
Only road into/out of Longmont is at I-25 & Highway 66 to Main Street. All north-south travel in Longmont is cut off by the #stvrainflood
— Boulder OEM (@BoulderOEM) September 13, 2013
Compiling ‘flood’ Tweets
As part of the previous round of analysis, we looked at a 2011 summer thunderstorm that dumped almost two inches of rain on the Boulder area in less than an hour. This intense rainfall was especially concerning because it was centered on a forest fire burn area up Fourmile Creek. Flash flood warnings were issued and sirens along Boulder Creek in central Boulder were activated to warn citizens of possible danger.
For that analysis, we collected geo-referenced Tweets containing keywords related to rain and storms (see here for more information on how these filters were designed). During the 48-hours around that event, there were 1,620 Tweets posted from 770 accounts. Here is how that event’s rain correlated with those Tweets.
For this round of analysis, we added a few more types of filters:
- Hashtags: As the 2013 Colorado flood unfolded, hashtags associated with the event came to life. The most common ones included #ColoradoFlood, #BoulderFlood, #LongmontFlood, and well as references to our local creeks and rivers with #BoulderCreek, #LefthandCreek and #StVrainRiver.
- Our Profile Geo enrichment had been introduced since the last round of analysis. Instead of needing to parse profile locations ourselves, we were able to let Gnip’s enrichment do the parsing and build simple rules that matched Tweets coming from Colorado-based accounts.
- Local agencies and media: Since this was such a significant regional event, we collected Tweets for local public agencies and local media accounts.
We applied these filters to six months of data – from August 10, 2013 to February 10, 2014 – beginning with a period that started before the flood to establish the ‘baseline’ level of postings.
Between September 1-7, 2013, there were less than 8,800 Tweets, from 4,900 accounts, matching our filters. During the first week of the flood, September 10-16, we found over 237,000 Tweets from nearly 63,000 Twitter accounts. (And in the following five months of recovery, there were nearly another 300,000 Tweets from 45,000 more accounts).
Comparing Twitter signals with weather data
As before, we wanted to compare the Twitter signal with a local rain gauge. We again turned to OneRain for local rain and stage data recorded during the event. (OneRain maintains critical early-warning equipment in the Boulder and Denver metropolitan areas, including the foothills in that region). This time we also wanted to compare the Twitter signal to local river levels. Figure 1 represents hourly rainfall (at the Boulder Justice Center) and maximum Boulder Creek water levels (at Broadway St.) along with hourly number of ‘flood’ Tweets.
Figure 1 – Hourly rainfall, Boulder Creek Levels and Tweets during the Colorado Flood 2013, September 10-17. Tweets matching the flood filters during this period equals over 237,000 Tweets. Those same filters matched less than 8,800 during the September 1-8 “baseline” period.
Twitter users finding information when it is most needed
You can see from the information above that our local public agencies played a critical role during the 2013 Colorado flood. Between September 10-17, the Boulder County Office of Emergency Management (@BoulderOEM) and the Boulder National Weather Service office (@NWSBoulder) posted a combined 431 Tweets. These Tweets included updates on current weather and flash flood conditions, information for those needing shelter and evacuation and details on the state of our regional infrastructure. These Tweets were also shared (Retweeted) over 8,600 times by over 4,300 accounts. The total amount of followers of the Twitter accounts that shared these Tweets was more than 9.5 million.
Twitter offers users the ability to actively update the accounts they want to follow. Knowing this, we assumed that the number of followers of these two local agencies would grow during the flood. To examine that type of Twitter signal, we compared the hourly data new followers and rain accumulation at the Boulder Justice Center. The results of that comparison are shown in Figure 2. These two agencies gained over 5,600 new followers, more than doubling their amount during September 10-16.
One interesting finding in Figure 2 is there seems to be a threshold of accumulated rainfall at which point Twitter users turn their attention to local agencies broadcasting about the flood. In this case it was around midnight on September 11, after five inches of rain and the start of local flooding. As the event worsened and it became more and more difficult to move around the region, more Twitter users tuned directly into the broadcasts from their local Office of Emergency Management and National Weather Service Twitter accounts.
Even as the region shifted its attention to flood recovery, the information being shared on Twitter was vital to the community. Just as the Twitter network was used in a variety of ways during the flood, it provided a critical broadcast channel as communities grappled with widespread damage. The major themes of Tweets posted immediately after the flood included:
- Information about the evacuated communities of Jamestown, Lyons and Longmont.
- Details on shelters and other support mechanisms for displaced residents.
- Organization of volunteers for cleanup activities.
- Promotion of charitable organization funds.
- Regional infrastructure conditions and updates. This article discusses how Tweets helped identify road and bridge damages in closed-off areas.
Based on all of this data, it’s very clear that the Twitter network played an important role during and after the 2013 Colorado flood. The combination of real-time eye-witness accounts and updates from our public agencies made Twitter a go-to source for critical emergency information.
In recognition of this important role, Twitter has introduced Twitter Alerts. This service provides the ability for Twitter users to sign up for mobile push notifications from their local public safety agencies. For any public agency with a mission of providing early-warning alerts, this service can help the public find the information they need during emergencies and natural disasters.
Gnip is one of the world’s largest and most trusted providers of social data. We partnered with Twitter four years ago to make it easier for organizations to realize the benefits of analyzing data across every public Tweet. The results have exceeded our wildest expectations. We have delivered more than 2.3 trillion Tweets to customers in 42 countries who use those Tweets to provide insights to a multitude of industries including business intelligence, marketing, finance, professional services, and public relations.
Today I’m pleased to announce that Twitter has agreed to acquire Gnip! Combining forces with Twitter allows us to go much faster and much deeper. We’ll be able to support a broader set of use cases across a diverse set of users including brands, universities, agencies, and developers big and small. Joining Twitter also provides us access to resources and infrastructure to scale to the next level and offer new products and solutions.
This acquisition signals clear recognition that investments in social data are healthier than ever. Our customers can continue to build and innovate on one of the world’s largest and most trusted providers of social data and the foundation for innovation is now even stronger. We will continue to serve you with the best data products available and will be introducing new offerings with Twitter to better meet your needs and help you continue to deliver truly innovative solutions.
Finally, a huge thank you to the team at Gnip who have poured their hearts and souls into this business over the last 6 years. My thanks to them for all the work they’ve done to get us to this point.
We are excited for this next step and look forward to sharing more with you in the coming months. Stay tuned!
After a couple exciting years in social finance and some major events, we’re back with an update to our previous paper “Social Media in Markets: The New Frontier”. We’re excited to be able to provide this broad update on a rapidly evolving and increasingly important segment of financial service.
Social media analytics for finance has lagged brand analytics by 3 to 4 years despite being an enormous potential for profit through investing based on social insights. Our whitepaper explains why that gap has existed and what has changed in the social media ecosystem that is causing that gap to close. Twitter conversation around tagged equities has grown by more than 500% since 2011. The whitepaper explores what that means for investors.
We examine the finance specific tools that have emerged as well as outline a framework for unlocking the value in social data for tools that are yet to be created. Then we provide an overview of changes in academic research, social content, and social analytics for finance providers that will help financial firms figure out how to capitalize on opportunities to generate alpha.
Like a child’s first steps or your first experiment with pop rocks candy, the first ever Tweet went down in the Internet history books eight years ago today. On March 21st, 2006, Jack Dorsey, co-founder of Twitter published this.
just setting up my twttr
— Jack Dorsey (@jack) March 21, 2006
Twttr, (the service’s first name), was launched to the public on July 15, 2006 where it was recognized for “good execution on a simple but viral idea.” Eight years later, that seems to have held true.
It has become the digital watering hole, the newsroom, the customer service do’s and don’ts, a place to store your witty jargon that would just be weird to say openly at your desk. And then there is that overly happy person you thought couldn’t actually exist, standing in front of you in line, and you just favorited their selfie #blessed. Well, this is awkward.
Just eight months after their release, the company made a sweeping entrance into SXSW 2007 sparking the platforms usage to balloon from 20,000 to 60,000 Tweets per day. Thus beginning the era of our public everyday lives being archived in 140 character tidbits. The manual “RT” turned into a click of a button, and favorites became the digital head nod. I see you.
In April 2009, Twitter launched the Trending Topics sidebar, identifying popular current world events and modish hashtags. Verified accounts became available that summer; Athletes, actors, and icons alike began to display the “verified account” tag on their Twitter pages. This increasingly became a necessity in recognizing the real Miley Cyrus vs. Justin Bieber. If differences do exist.
The Twitter Firehose launched in March 2010. By giving Gnip access, a new door had opened into the social data industry and come November, filtered access to social data was born. Twitter turned to Gnip to be their first partner serving the commercial market. By offering complete access to the full firehose of publicly-available Tweets under enterprise terms, this partnership enabled companies to build more advanced analytics solutions with the knowledge that they would have ongoing access to the underlying data. This was a key inflection point in the growth of the social data ecosystem. By April, Gnip played a key role in the delivering past and future Twitter data to the Library of Congress for historic preservation in the archives.
July 31, 2010, Twitter hit their 20 billionth Tweet milestone, or as we like to call it, twilestone. It is the platform of hashtags and Retweets, celebrities and nobodies, at-replies, political rants, entertainment 411 and “pics or it didn’t happen.” By June 1st, 2011, Twitter allowed just that as it broke into the photo sharing space, allowing users to upload their photos straight to their personal handle.
One of the most highly requested features was the ability to get historical Tweets. In March 2012, Gnip delivered just that by making every public Tweet available starting from March 21, 2006 by Mr. Dorsey himself.
Fast forward 8 years, Twitter is reporting over 500 million Tweets per day. That’s more than 25,000 times the amount of Tweets-per-day in just 8 years! With over 2 billion accounts, over a quarter of the world’s population, Twitter ranks high among the top websites visited everyday. Here’s to the times where we write our Twitter handle on our conference name tags instead of our birth names, and prefer to be tweeted at than texted. Voicemails? Ain’t nobody got time for that.
Twitter launched a special surprise for its 8th birthday. Want to check out your first tweet?
“There’s a #FirstTweet for everything.” Happy Anniversary!
As a company that’s constantly innovating and driving forward, it’s sometimes easy to forget everything that’s led us to where we are today. When Gnip was founded 6 years ago, social data was in its infancy. Twitter produced only 300,000 Tweets per day; social data APIs were either non-existent or unreliable; and nobody had any idea what a selfie was.
Today social data analytics drives decisions in every industry you can imagine, from consumer brands to finance to the public sector to industrial goods. From then to now, there have been dozens of milestones that have helped create the social data industry and we thought it would be fun to highlight and detail all of them in one place.
The story begins humbly in Boulder, Colorado with the concept of changing the way data was gathered from public APIs of social networks. Normally, one would ‘ping’ the API and ask for data, Gnip wanted to reverse that structure (hence our name). In these early days, we focused on simplifying access to existing public APIs but our customers constantly asked us how they could get more and better access to social data. In November of 2010, we were finally able to better meet their needs when we partnered with Twitter to provide access to full Firehose of public Tweets, the first partnership of its kind.
This is when Gnip started to build the tools that have shaped the social data industry. While getting a Firehose of Tweets was great for the industry, the reality was our customers didn’t need 100% of all Tweets, they needed 100% of relevant Tweets. We created PowerTrack to enable sophisticated filtering on the full Firehose of Tweets and solve that problem. We also built valuable enrichments, reliability products, and historical data access to create the most robust Twitter data access available.
While Twitter data was where the industry started, our customers wanted data from other social networks as well. We soon created partnerships with Klout, StockTwits, WordPress, Disqus, Tumblr, Foursquare and others to be the first to bring their data to the market. Our work didn’t end there though. We have been continually adding in new sources, new enrichments, and new products. We also launched the first conference dedicated to social data as well as the first industry organization for social data. Things have come a long way in 6 years and we can’t wait to see the developments in the next 6 years.
Check out our interactive timeline for the full list of milestones and details.
If you’re one of 30,000 headed to SXSW, we’ve got our social data and data science panel picks for SXSW that you should attend between BBQ and breakfast tacos. Also, if you’re interested in hanging with Gnip, we’ve listed the places we’ll have a presence at!
Also, we’ll be helping put on the Big Boulder: Boots & Bourbon party at SXSW for folks in the social data industry. Send an email to email@example.com for an invite.
What Social Media Analytics Can’t Tell You
Friday, March 7 at 3:30 PM to 4:30 PM: Sheraton Austin, EFGH
Great panel with Vision Critical, Crowd Companies and more. “Whether you’re looking for fresh insight on what makes social media users tick, or trying to expand your own monitoring and analytics program, this session will give you a first look at the latest data and research methods.”
Book Signing – John Foreman, Chief Data Scientist at MailChimp
Friday, March 7 at 3:50 to 4:10 PM: Austin Convention Center, Ballroom D Foyer
During an interview with Gnip, John said that the data challenge he’d most like to solve is the Taco Bell menu. You should definitely get his book and get it signed.
Truth Will Set You Free but Data Will Piss You Off
Saturday, March 8 from 3:30 to 4:30 PM: Sheraton Austin Creekside
All-star speakers from DataKind, Periscopic and more talking about “the issues and ethics around data visualization–a subject of recent debate in the data visualization community–and suggest how we can use data in tandem with social responsibility.”
Keeping Score in Social: It’s More than Likes
Saturday, March 8 from 5:15 to 5:30 PM: Austin Convention Center, Ballroom F
Jim Rudden, the CMO of Spredfast, brands will talk about “what it takes to move beyond measuring likes to measuring real social impact.”
Mentor Session: Emi Hofmeister
Sunday, March 9 at 11 AM to 12 PM: Hilton Garden Inn, 10th Floor Atrium
Meet with Emi Hofmeister, the senior product marketing manager at Adobe Social. All sessions appear to be booked but keep an eye out for cancellations. Sign up here: http://mentor.sxsw.com/mentors/316
The Science of Predicting Earned Media
Sunday, March 9 at 12:30 to 1:30 PM: Sheraton Austin, EFGH
“In this panel session, renowned video advertising expert Brian Shin, Founder and CEO at Visible Measures, Seraj Bharwani, Chief Analytics Officer at Visible Measures, along with Kate Sirkin, Executive Vice President, Global Research at Starcom MediaVest Group, will go through the models built to quantify the impact of earned media, so that brands can not only plan for it, but optimize and repeat it.”
GNIP EVENT: Beyond Dots on a Map: Visualizing 3 Billion Tweets
Sunday, March 9 at 1:00-1:15 PM: Austin Convention Center, Ballroom E
Gnip’s product manager, Ian Cairns, will be speaking about the massive Twitter visualization Mapbox and Gnip created and what 3 billion geotagged Tweets can tell us.
Mentor Session: Jenn Deering Davis
Sunday, March 9 at 5 to 6 PM: Hilton Garden Inn, 10th Floor Atrium
SIgn up for a mentoring session with Jenn Deering Davis, the co-founder of Union Metrics. Sign up here – http://mentor.sxsw.com/mentors/329
Algorithms, Journalism & Democracy
Sunday, March 9 from 5 to 6 PM: Austin Convention Center, Room 12AB
Read our interview with Gilad Lotan of betaworks on his SXSW session and data science. Gilad will be joined by Kelly McBride of the Poynter Institute about the ways algorithms are biased in ways that we might not think about it. “Understanding how algorithms control and manipulate your world is key to becoming truly literate in today’s world.”
Scientist to Storyteller: How to Narrate Data
Monday, March 10 at 12:30 – 1:30 PM: Four Seasons Ballroom
See our interview with Eric Swayne about this SXSW session and data narration. On the session, “We will understand what a data-driven insight truly IS, and how we can help organizations not only understand it, but act on it.”
#Occupygezi Movement: A Turkish Twitter Revolution
Monday, March 10 at 12:30 – 1:30 PM: Austin Convention Center, Room 5ABC
See our interview with Yalçin Pembeciogli about how the Occupygezi movement was affected by the use of Twitter. “We hope to show you the social and political side of the movements and explain how social media enabled this movement to be organic and leaderless with many cases and stories.”
GNIP EVENT: Dive Into Social Media Analytics
Monday, March 10 at 3:30 – 4:30 PM: Hilton Austin Downtown, Salon B
Gnip’s VP of Product, Rob Johnson, will be speaking alongside IBM about “how startups can push the boundaries of what is possible by capturing and analyzing data and using the insights gained to transform the business while blowing away the competition.”
Measure This; Change the World
Tuesday, March 11 at 11 AM to 12 PM: Sheraton Austin, EFGH
A panel with folks from Intel, Cornell, Knowable Research, etc. looking at what we can learn from social scientists and how they measure vs how marketers measure.
Make Love with Your Data
Tuesday, March 11 at 3:30 to 4:30 PM: Sheraton Austin, Capitol ABCD
This session is from the founder of OkCupid, Christian Rudder. I interviewed Christian previously and am a big fan. “We’ll interweave the story of our company with the story of our users, and by the end you will leave with a better understanding of not just OkCupid and data, but of human nature.”
The decision of who to get social data from is not necessarily an easy one. The reality is that each business has unique social data needs, yet there is no blueprint for how to determine your needs. While your social data provider should be able to guide you in the right direction, picking a social data provider in the first place is just as tough. Here are some questions that you can use when determining your needs and evaluating social data providers:
1. Can you provide me with all of the data that I need?
This is one of the most important considerations. It is especially important to think about this question on two dimensions. The first dimension is, does your social data provider have access to all the sources that you need? The second dimension is, does your social data provider have access to complete data from those sources?
In terms of access to social data, wanting Twitter data is a common place to start, however complete analysis comes from having data from any source that is relevant to what you need to analyze. Consider things like physical location, demographic of audience, and types of interactions desired and you’ll quickly realize that sources like Tumblr, Foursquare, WordPress, Disqus and others are critically important to creating a full view of the conversation. Make sure your social data provider can give you all the data you need.
When it comes to a social data provider offering complete data from a source, it is important to note that it is entirely up to the source whether they offer up all of their public data and who they offer complete data through. Some data sources provide complete access, while others do not. Without complete access to a source a provider cannot claim to be able to give you all of the data you need from a given source. Complete access simply means that a provider receives a stream of data from the source that contains all of the public data available. This is also known as firehose access. Sometimes sources don’t allow for complete access in which case you should verify that they can optimize the requests sent to get as much data as the provider will allow.
2. Can you offer me the level of reliability I need?
Social analysis is only as accurate as the social data analyzed. What kind of reliability can your social data provider offer? If you disconnect from the stream, can they still provide you the data that was missed? Can they do that automatically for you? What about if you’re disconnected for an extended period of time? Those are all really important safety net considerations but there is a form of reliability that is even better. Redundancy. You should make sure your data provider offers the ability to consume a second replicated stream along with your production stream. A redundant stream can prevent missed data from occurring before it even happens. Finally, check to see if your social data provider can tell you if you’ve missed any data. Sometimes you think you may have missed something important, your social data provider should be able to tell you whether you’ve missed something important or if you’re getting all the data you should.
3. Do you provide ways for me to get only the data I need?
Ingesting and storing a firehose of data is too complicated and expensive for most companies to handle. Your social data provider should allow you to filter the firehose of data to get only the data you need. Your social data provider should allow you to filter the data based on what’s important to your business. Things you may want to filter by are keywords, phrases, from and to operators, contain operators, language, location, and type, although there’s other things that may be important for your analysis. Make sure your social provider allows you to filter to get exactly what you need.
4. Can I update my filters quickly and easily, without losing data?
Beyond providing ways to filter the social data coming from the sources, allowing for the ability to update those filters quickly and easily is important. When you consider how quickly social conversation occurs, having to manually update multiple streams can cause you to miss a lot of important conversation. Does your data provider allow you to update, manage, and organize filters through a single connection, dynamically? Does it allow you to update all of your filters through a single connection, dynamically as opposed to a connection for each filter set? Is this available through an API to happen instantaneously? And are disconnections required to update the rule set? If so, this could cause you to miss data as the system disconnects and then reconnects.
5. Do you offer historical data? And if so, how is it delivered?
Realtime data is the cornerstone of the social analytics industry, but historical data can allow you to analyze so much more and analyze data in new ways. Check with your social data provider to see what historical social data they can make available to you. Historical data can be delivered immediately or can be made available as a batch job. Depending on your need, complexity, and budget you may only need one form of delivery or you may end up using both. Consider whether you need historical data and if you do make sure your data provider can get you the historical data you need.
6. What kind of metadata enrichments do you offer?
While the data from the source is primarily what you’re after, there’s additional data that can help you do better analyses. See what additional data your provider can include and determine if it is relevant for you. Is this data redundant to something your business excels at? Does this additional data provide additional value that you wouldn’t otherwise have? Enrichments in your data stream such as location or influence data can mean the difference between a generic analysis and great insights.
7. Who else relies on your data?
Sometimes the greatest tell of a company is who trusts them. This is especially true with social data where many businesses, big and small, rely on the data as the foundation of their business. Look at who your provider can offer as reference customers and if those companies have similar needs as you.
8. What are you doing to make sure I will continue to get the social data I need?
Social data can’t be here today and gone tomorrow. Consistent, long-term data access means compliance with terms of service, long-term contracts and economics where everyone succeeds. Make sure your data provider is working directly with the sources on things like sustainability and policies. Susan Etlinger of Altimeter has a great post on why getting data from the source matters. Make sure your data provider is giving you data that’s compliant with the rules of the source providing the data. Is your social data provider involved in industry advocacy and improving data quality? Building analytics isn’t easy or cheap, make sure you’re working with a data provider that’s investing in your longevity and success.
9. What is your pricing based on?
Figuring out how to price social data is not an easy thing, and different data providers tackle that problem differently. At the end of the day, your goal should be to make sure you understand the factors that go into determining your price and finding a package that meets your needs.
This list is not meant to be exhaustive, there are many other things you should consider when choosing a social data provider. You should make sure to document and ask the questions that are important to you, hopefully this list helps you get started.
In a blog post last week, we outlined how Microsoft Research was able to indicate whether someone was depressed based on their activity on Twitter, which is groundbreaking research. And this week we looked at how social data can be used for tracking food poisoning outbreaks. We’ve also seen several amazing examples of practical applications of social data as a critical signal and life-saving data source during disaster situations. We still think this is just the beginning in how social data can be used in this sector.
Gnip’s first whitepaper on Social Data in the Public Sector helped outline what social data is, how it can be used and the current implications of using social data. Due to the success of this whitepaper, we wanted to create a followup ebook.
This ebook highlights the use cases of social data in government, and how organizations can determine the right social data for their needs. We’ll outline cases of how social data is used in epidemiology, natural disaster relief, political campaigns, city planning, law enforcement and government surveys. You can download the whitepaper here and send any questions you have to firstname.lastname@example.org.
Sometimes in the world of social data it is hard to grasp the amazing possibilities when we use words to describe things. The old adage that a picture is worth a thousand words is true, so we wanted to show you what our new Profile Geo enrichment does.
First, here is what Profile Geo is:
Gnip’s Profile Geo enrichment significantly increases the amount of usable geodata for Twitter. It normalizes unstructured location data from Twitter users’ bio locations and matches those latitude/longitude coordinates for those normalized places. For examples, everyone who mentions “NYC,” New York City,” “Manhattan,” and even some odd instances like “NYC Baby✌” all get normalized to “New York City, New York, United States” so they’re easy to map.
We think this is really powerful stuff. These maps were created using 2 sets of Tweets taken over 3 Sundays where we were looking for Tweets containing the term “football.” The map for Standard Geo is comprised of Tweets where users specifically geotagged their Tweet with their latitude and longitude (natively in the Twitter payload). The map for Profile Geo is comprised of Tweets where Gnip was able to enrich additional Tweets and assign the Tweet to a latitude and longitude.
As you can see the amount of location data available through Profile Geo is significantly higher than through Standard Geo. To be specific, we did our “football” search using the Decahose, a random sampling of 10% of the full Twitter firehose. Standard Geo returned just under 3,000 Tweets, while the Profile Geo search returned more than 40,000 Tweets! (Multiply those by 10 to get approximations of firehose volumes) With this additional geodata the possibilities are limitless. The NFL can understand the demographics of their demand better, football clubs in the UK can see how far their reach is, TV networks can use this data to tailor media, among infinite other uses.
If you were to remove the search for “football” and use the entire firehose of Twitter data you’d find that you can receive roughly 15 times the amount of geo-relevant data by using Gnip’s Profile Geo enrichment instead of just the geodata in the standard stream. Anyone using geodata in their social data analyses should find great value in this dramatic increase in georelevant data.
If images are better than words, then interactive maps are better than images. Here are the maps so you can play around and see the difference yourself. Zooming in will depict just how much more data is available with Profile Geo in clear detail: