9 Questions To Ask Your Social Data Provider

The decision of who to get social data from is not necessarily an easy one. The reality is that each business has unique social data needs, yet there is no blueprint for how to determine your needs. While your social data provider should be able to guide you in the right direction, picking a social data provider in the first place is just as tough. Here are some questions that you can use when determining your needs and evaluating social data providers:

1. Can you provide me with all of the data that I need?
This is one of the most important considerations. It is especially important to think about this question on two dimensions. The first dimension is, does your social data provider have access to all the sources that you need? The second dimension is, does your social data provider have access to complete data from those sources?

Dimension 1:
In terms of access to social data, wanting Twitter data is a common place to start, however complete analysis comes from having data from any source that is relevant to what you need to analyze. Consider things like physical location, demographic of audience, and types of interactions desired and you’ll quickly realize that sources like Tumblr, Foursquare, WordPress, Disqus and others are critically important to creating a full view of the conversation. Make sure your social data provider can give you all the data you need.


Dimension 2:
When it comes to a social data provider offering complete data from a source, it is important to note that it is entirely up to the source whether they offer up all of their public data and who they offer complete data through. Some data sources provide complete access, while others do not. Without complete access to a source a provider cannot claim to be able to give you all of the data you need from a given source. Complete access simply means that a provider receives a stream of data from the source that contains all of the public data available. This is also known as firehose access. Sometimes sources don’t allow for complete access in which case you should verify that they can optimize the requests sent to get as much data as the provider will allow.


2. Can you offer me the level of reliability I need?
Social analysis is only as accurate as the social data analyzed. What kind of reliability can your social data provider offer? If you disconnect from the stream, can they still provide you the data that was missed? Can they do that automatically for you? What about if you’re disconnected for an extended period of time? Those are all really important safety net considerations but there is a form of reliability that is even better. Redundancy. You should make sure your data provider offers the ability to consume a second replicated stream along with your production stream. A redundant stream can prevent missed data from occurring before it even happens. Finally, check to see if your social data provider can tell you if you’ve missed any data. Sometimes you think you may have missed something important, your social data provider should be able to tell you whether you’ve missed something important or if you’re getting all the data you should.

3. Do you provide ways for me to get only the data I need?
Ingesting and storing a firehose of data is too complicated and expensive for most companies to handle. Your social data provider should allow you to filter the firehose of data to get only the data you need. Your social data provider should allow you to filter the data based on what’s important to your business. Things you may want to filter by are keywords, phrases, from and to operators, contain operators, language, location, and type, although there’s other things that may be important for your analysis. Make sure your social provider allows you to filter to get exactly what you need.

4. Can I update my filters quickly and easily, without losing data?
Beyond providing ways to filter the social data coming from the sources, allowing for the ability to update those filters quickly and easily is important. When you consider how quickly social conversation occurs, having to manually update multiple streams can cause you to miss a lot of important conversation. Does your data provider allow you to update, manage, and organize filters through a single connection, dynamically? Does it allow you to update all of your filters through a single connection, dynamically as opposed to a connection for each filter set? Is this available through an API to happen instantaneously? And are disconnections required to update the rule set? If so, this could cause you to miss data as the system disconnects and then reconnects.

5. Do you offer historical data? And if so, how is it delivered?
Realtime data is the cornerstone of the social analytics industry, but historical data can allow you to analyze so much more and analyze data in new ways. Check with your social data provider to see what historical social data they can make available to you. Historical data can be delivered immediately or can be made available as a batch job. Depending on your need, complexity, and budget you may only need one form of delivery or you may end up using both. Consider whether you need historical data and if you do make sure your data provider can get you the historical data you need.

6. What kind of metadata enrichments do you offer?
While the data from the source is primarily what you’re after, there’s additional data that can help you do better analyses. See what additional data your provider can include and determine if it is relevant for you. Is this data redundant to something your business excels at? Does this additional data provide additional value that you wouldn’t otherwise have? Enrichments in your data stream such as location or influence data can mean the difference between a generic analysis and great insights.

7. Who else relies on your data?
Sometimes the greatest tell of a company is who trusts them. This is especially true with social data where many businesses, big and small, rely on the data as the foundation of their business. Look at who your provider can offer as reference customers and if those companies have similar needs as you.

8. What are you doing to make sure I will continue to get the social data I need?
Social data can’t be here today and gone tomorrow. Consistent, long-term data access means compliance with terms of service, long-term contracts and economics where everyone succeeds. Make sure your data provider is working directly with the sources on things like sustainability and policies. Susan Etlinger of Altimeter has a great post on why getting data from the source matters. Make sure your data provider is giving you data that’s compliant with the rules of the source providing the data. Is your social data provider involved in industry advocacy and improving data quality? Building analytics isn’t easy or cheap, make sure you’re working with a data provider that’s investing in your longevity and success.

9. What is your pricing based on?
Figuring out how to price social data is not an easy thing, and different data providers tackle that problem differently. At the end of the day, your goal should be to make sure you understand the factors that go into determining your price and finding a package that meets your needs.

This list is not meant to be exhaustive, there are many other things you should consider when choosing a social data provider. You should make sure to document and ask the questions that are important to you, hopefully this list helps you get started.

Profile Geo: When You Need More Geodata In Your Twitter Data

Sometimes in the world of social data it is hard to grasp the amazing possibilities when we use words to describe things. The old adage that a picture is worth a thousand words is true, so we wanted to show you what our new Profile Geo enrichment does.

First, here is what Profile Geo is:
Gnip’s Profile Geo enrichment significantly increases the amount of usable geodata for Twitter. It normalizes unstructured location data from Twitter users’ bio locations and matches those latitude/longitude coordinates for those normalized places. For examples, everyone who mentions “NYC,” New York City,” “Manhattan,” and even some odd instances like “NYC Baby✌” all get normalized to “New York City, New York, United States” so they’re easy to map.

Now, here is what Profile Geo does in practice for users interested in Twitter geodata:
Football Geo

We think this is really powerful stuff. These maps were created using 2 sets of Tweets taken over 3 Sundays where we were looking for Tweets containing the term “football.” The map for Standard Geo is comprised of Tweets where users specifically geotagged their Tweet with their latitude and longitude (natively in the Twitter payload). The map for Profile Geo is comprised of Tweets where Gnip was able to enrich additional Tweets and assign the Tweet to a latitude and longitude.

As you can see the amount of location data available through Profile Geo is significantly higher than through Standard Geo. To be specific, we did our “football” search using the Decahose, a random sampling of 10% of the full Twitter firehose. Standard Geo returned just under 3,000 Tweets, while the Profile Geo search returned more than 40,000 Tweets! (Multiply those by 10 to get approximations of firehose volumes) With this additional geodata the possibilities are limitless. The NFL can understand the demographics of their demand better, football clubs in the UK can see how far their reach is, TV networks can use this data to tailor media, among infinite other uses.

If you were to remove the search for “football” and use the entire firehose of Twitter data you’d find that you can receive roughly 15 times the amount of geo-relevant data by using Gnip’s Profile Geo enrichment instead of just the geodata in the standard stream. Anyone using geodata in their social data analyses should find great value in this dramatic increase in georelevant data.

If images are better than words, then interactive maps are better than images. Here are the maps so you can play around and see the difference yourself. Zooming in will depict just how much more data is available with Profile Geo in clear detail:
Continue reading

Chumming for Insights: A Social Take on Sharknado

For a brief moment the term Sharknado took the social universe by storm. If you haven’t heard about the Syfy channel original TV movie let us inform you. The biggest actor attached to the film is Tara Reid. With an estimated budget of $1 million, the marketing push behind this Made-For-TV movie must have been incredibly low. Yet, at one point during the movie’s first air the term “Sharknado” hit 5,000 Tweets per minute and became a trending topic. According to Nielsen, approximately 12.3% of all Tweets related to TV were about Sharknado on the day it aired – twice as many Tweets as the next most Tweeted TV event, the return of Derek Jeter in the Yankees vs Kansas City game.

Companies spend millions of dollars promoting hashtags in commercials and yet this movie with a budget less than many companies spend on a single commercial was able to become an instant sensation. In the end, it is results that matter and in this case, the results are viewers. Sharknado was able to achieve an impressive 1.37 million viewers. To compare, NBC during primetime on the same day maxed out at 1.15 million viewers. So how does a small cable channel like Syfy get more viewers than the big boys like NBC?

While Twitter was the dominant focus of the conversation for Sharknado, we thought we would look at how that conversation translated on to other social channels. Was Sharknado spreading like wildfire on Tumblr the way it was on Twitter? Were people blogging about it and discussing it on WordPress and Disqus?

Let’s take a look at Sharknado Social Media:

The white line in the graph is when the first air of Sharknado happened.

These graphs show that, outside of Twitter, conversation about Sharknado acted mostly as expected, except for on Tumblr. WordPress and Disqus saw their peak of activity after the movie aired. People were likely using the long form nature of WordPress blogging reviews followed by Disqus comments to further the discussion, which is typical for these sources of data.

But the really interesting graph is the Tumblr graph:

There are a couple of interesting things to note about how Sharknado conversation happened on Tumblr:

  • The initial spike on July 7th, which is due to a teaser animated GIF that got picked up and reblogged 3000 times an hour.
  • The spike in activity on July 10, which marks the release of the official trailer for Sharknado on YouTube and it’s spread on Tumblr. Tumblr users picked this up and shared it at an impressive 5,500 posts per hour at its peak.
  • The consistent stream of posts related to Sharknado since the air. While all other networks, including Twitter, have seen a significant drop-off, Tumblr is sharing Sharknado related content more after the initial air than before it.

What this means is that social conversation online doesn’t just happen where you intend for it to, and it doesn’t just happen where you are looking. Analyzing the conversation across social networks gives you a full picture of the social conversation and gives you greater visibility into results of your marketing push. Rumor has it Sharknado has a sequel in the works, our bet is that you’ll find the first glimpses of it’s virality on Tumblr and you’ll see it last there until the first glimpses of Sharknado 3.