Social Data vs Social Media

One area I see a lot of confusion about is the difference between social media vs. social data. I come from a social media background and use social media in marketing, so I see where the confusion can come from.

The easiest way to think about it in plain English:

  • Social Media: User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately.

  • Social Data: Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences.

Or otherwise boiled down, social media is readable by humans and made for human interaction while social data is social media that is readable by computers.

Let’s look at a Tweet in form of social media and social data to show exactly what I’m talking about.

From this Tweet from Gnip, we can visually see that it uses the #BigBoulder hashtag, a Bit.ly link to our Storify page, that it has 73 retweets and 3 favorites, the time and date of the Tweet.  

 

Now let’s take a look at what the architecture of a Tweet looks like when received from an API.

 

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
{
   "body": "RT @gnip: Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz",
   "retweetCount": 71,
   "generator": {
      "link": "http://twitter.com",
      "displayName": "web"
   },
   "gnip": {
      "klout_score": 53,
      "matching_rules": [
         {
            "tag": "old krusty tweet",
            "value": "thrilled to welcome all attendees"
         }
      ],
      "language": {
         "value": "en"
      },
      "urls": [
         {
            "url": "http://t.co/ZzqUMfJz",
            "expanded_url": "http://storify.com/Gnip/big-boulder"
         }
      ]
   },
   "object": {
      "body": "Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz",
       "generator": {
         "link": "http://www.tweetdeck.com",
         "displayName": "TweetDeck"
      },
      "object": {
         "postedTime": "2012-06-20T18:07:13.000Z",
         "summary": "Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz",
      "link": "http://twitter.com/gnip/statuses/215506104082366465",
         "id": "object:search.twitter.com,2005:215506104082366465",
         "objectType": "note"
      },
      "actor": {
         "preferredUsername": "gnip",
         "displayName": "Gnip, Inc.",
         "links": [
            {
               "href": "http://gnip.com",
               "rel": "me"
            }
         ],
         "twitterTimeZone": "Mountain Time (US & Canada)",
         "image": "http://a0.twimg.com/profile_images/1347133706/
Gnip_logo-73x73_normal.png",
         "verified": true,
         "location": {
            "displayName": "Boulder, CO",
            "objectType": "place"
         },
         "statusesCount": 971,
         "summary": "Gnip is the leading provider of social media data for enterprise
applications, facilitating access to dozens of social media sources through a single
API",
         "languages": [
            "en"
         ],
         "utcOffset": "-25200",
         "link": "http://www.twitter.com/gnip",
         "followersCount": 3335,
         "favoritesCount": 108,
         "friendsCount": 384,
         "listedCount": 212,
         "postedTime": "2008-10-24T23:22:09.000Z",
         "id": "id:twitter.com:16958875",
         "objectType": "person"
      },
      "twitter_entities": {
         "user_mentions": [],
         "hashtags": [
            {
               "indices": [
                  24,
                  35
               ],
               "text": "BigBoulder"
            }
         ],
         "urls": [
            {
               "indices": [
                  98,
                  118
               ],
               "url": "http://t.co/ZzqUMfJz",
               "expanded_url": "http://bit.ly/MumrVJ",
               "display_url": "bit.ly/MumrVJ"
            }
         ]
      },
      "verb": "post",
      "link": "http://twitter.com/gnip/statuses/215506104082366465",
      "provider": {
         "link": "http://www.twitter.com",
         "displayName": "Twitter",
         "objectType": "service"
      },
      "postedTime": "2012-06-20T18:07:13.000Z",
      "id": "tag:search.twitter.com,2005:215506104082366465",
      "objectType": "activity"
   },
   "actor": {
      "preferredUsername": "daveheal",
      "displayName": "Dave Heal",
      "links": [
         {
            "href": "http://daveheal.com",
            "rel": "me"
         }
      ],
      "twitterTimeZone": "Mountain Time (US & Canada)",
      "image": "http://a0.twimg.com/profile_images/1755125722/photo_2_normal.JPG",
      "verified": false,
      "location": {
         "displayName": "Boulder, CO",
         "objectType": "place"
      },
      "statusesCount": 5657,
      "summary": "Boulder resident. Rochester NY native. Michigan Law graduate.
Copyright enthusiast. Liker of sports. DFW fanboy. CrossFitter. Work @Gnip. ",
      "languages": [
         "en"
      ],
      "utcOffset": "-25200",
      "link": "http://www.twitter.com/daveheal",
      "followersCount": 671,
      "favoritesCount": 28,
      "friendsCount": 292,
      "listedCount": 26,
      "postedTime": "2009-03-02T01:18:39.000Z",
      "id": "id:twitter.com:22432819",
      "objectType": "person"
   },
   "twitter_entities": {
      "user_mentions": [
         {
            "indices": [
               3,
               8
            ],
            "id": 16958875,
            "screen_name": "gnip",
            "id_str": "16958875",
            "name": "Gnip, Inc."
         }
      ],
      "hashtags": [
         {
            "indices": [
               34,
               45
            ],
            "text": "BigBoulder"
         }
      ],
      "urls": [
         {
            "indices": [
               108,
               128
            ],
            "url": "http://t.co/ZzqUMfJz",
            "expanded_url": "http://bit.ly/MumrVJ",
            "display_url": "bit.ly/MumrVJ"
         }
      ]
   },
   "verb": "share",
   "link": "http://twitter.com/daveheal/statuses/215509188481253376",
   "provider": {
      "link": "http://www.twitter.com",
      "displayName": "Twitter",
      "objectType": "service"
   },
   "postedTime": "2012-06-20T18:19:29.000Z",
   "id": "tag:search.twitter.com,2005:215509188481253376",
   "objectType": "activity"
}

This is social data. Same content, very different format, very different context and very different end user.

So what exactly does goes into the social data of a Tweet? To start, here is some of the metadata that you’re seeing.

  • Language identification — It is detected that the language of this Tweet is in English. Language identification is important for social media monitoring so companies can correctly monitor for the content they want.

  • URL expansion — Essentially this resolves or traces a shortened url to the end url that a consumer would see in their browser window. In this case, http://storify.com/Gnip/big-boulder is the link we shared using bitly.

  • Content — Gnip shows the full content of the Tweeted message, as well as metadata about the Tweet; like hashtags and URLs used, users that were mentioned, and when it was posted.

  • User — Gnip provides the display name, username, user’s stated location and additional bio information of the Tweeter. This is the information that users decide to share when signing up for an account.

  • Klout scores — An additional piece of metadata Gnip can provide is Klout score, so if one of our clients only wanted to see tweets with a Klout score of 30 or higher, they could do that.

Beyond Twitter data, Gnip offers social data from Tumblr, Disqus, Automattic (WordPress) and other publishers that all have their own unique metadata and enrichments. In addition to enrichments, Gnip offers format normalization. This means if you’re looking at a WordPress blog or a Tweet, the data is normalized no matter what the platform. E.g. date and location are formated and located in the same place within the JSON payload; making it easy to consume and parse data from multiple different sources.

Finally, a big difference is in how people use social data vs social media. Social data is what powers social media monitoring and analytics companies, it’s used in business intelligence to combine with other data sets, it’s used by hedge funds as part of their algorithms when looking at financial trades, or even to take a top-level look during a natural disaster.

Access to Public APIs from Instagram, bitly, Reddit, Stack Overflow, Panaramio and Plurk

Our customers care about every public conversation that happens online. Every month we deliver more than 100 billion social data activities to our clients. While much of our social data is from our premium publishers (Twitter, Tumblr, WordPress, Disqus and StockTwits), we also make a wide range of social data from public APIs readily available through our Enterprise Data Collector product. A significant part of what Gnip does is make social data easier to digest by optimizing the polling of these APIs and by enriching and normalizing the data. We also normalize the data, so if you’re digesting social data from Gnip from the public API of Instagram, it will appear in the same normalized format as social data from Twitter.

To that end, we’re announcing the addition of the public APIs for Instagram, bitly, Reddit, Stack Overflow, Panaramio and Plurk to the Gnip Enterprise Data Collector. While some of those might make perfect sense to you, others might make you turn your head and say, “huh.” Below we have more background on each publisher and why they’re important to the social data ecosystem.

Instagram on Enterprise Data Collector

This photo sharing app, recently acquired by Facebook, continues to be one of the fastest growing social networks out there with 90 monthly million active users. Every day there are 40 million photos uploaded, and every second users like 8,500 photos and make 1,000 comments about them. Our customers have traditionally been very interested in geotagged social data, and between 15 to 25 percent of Instagram users geotag their photographs.

Instagram has become a popular marketing tool for brands from Anthropologie, Intel, Virgin America, Taco Bell and American Express to name a few with Instagram accounts. Furthermore, we’ve really started to see Instagram as a popular tool around current events and for citizen reporting. During Hurricane Sandy, many people used Instagram as a way to document what was happening around them and showing destruction in real time. With the recent inauguration, CNN asked users to tag their Inaugural Instagram photos with #CNN and they saw users submitting an average of 25 photos every few seconds.

Customers accessing the Enterprise Data Collector will be able to access popular posts, conduct tag searches and geosearches.

Potential Uses for the Instagram API:

  • Tracking photos around natural disasters
  • Geo use cases for a given location
  • Brand monitoring

bitly on Enterprise Data Collector

bitly is the easiest and most fun way to save, share and discover links from around the web. While commonly associated as a link shortener for Twitter, bitly is used across the web and provides great information about what social sites are driving traffic. People use bitly to share 80 million new links a day.

Gnip customers will be able to search keywords some of destination page title and URL and some of the content and header tags.

Potential Uses for the bitly API:

  • Monitoring for brand mentions
  • Understanding trending content

Reddit on Enterprise Data Collector 

Reddit is a social news site with user-generated content covering nearly every topic in the world. One of the world’s fastest growing sites in the world, Reddit has 50 million active users contributing links, stories, pictures and topics of discussion.

Customers will be able to search by keyword and hot topics. Brands are often unaware of stories percolating about them on the popular site. One recent interesting example is where a Redditor posted an Applebee’s receipt where a pastor refused to tip her waitress based on how much she was tithing, which ultimately ended up being a national news story.

Potential Uses for the Reddit API:

  • Monitoring for brand mentions
  • Crisis communications warning

Stack Overflow on Enterprise Data Collector

Stack Overflow is a community edited Q&A site about computer programming, making it easy for programmers to find answers to questions they have about code. The site has more than 1.5 million registered users and 4 million questions.

Customers will have access to the entire firehose of Stack Overflow Answers and be able to search tags, reputation and comments by keyword. Programmers tag their questions and making it easy to find the content you’re looking for. Currently, the six most popular tags are C#, Java, PHP, JavaScript, jQuery, and Android.

Potential Uses for the Stack Overflow API:

  • Monitoring questions and discussion about software and technical brands
  • Monitoring bugs and outages
  • Often requested in conjunction with review sites

Panoramio on Enterprise Data Collector

Panoramio is a photo-sharing website with geotagged content that is layered upon Google Earth and Google Maps. Panoramio allows viewers to see an enhanced view of Google Earth because they can see other photos taken in the area.

Customers will be able to use a bounding box to view photos within a certain location. We have consistently found that our customers are eager for more social data with geotagged content.

Potential Uses for the Panoramio API:

  • Monitor social activity within a certain geographic area

Plurk on Enterprise Data Collector

Plurk is a microblogging site that allows users to communicate in posts with 210 characters and emoticons. Plurk has more than 1 million active users that post 3 million “Plurks” each day. Plurk is one of the more popular social networks in Taiwan and also has a strong presence in Hong Kong, Singapore, Philippines and India. Gnip customers will be able to search for keywords within posts.

Potential Uses for the Plurk API:

  • Monitoring for brand mentions, with a particular focus on certain Asian countries
  • Understanding trending content

If you’re interested in learning more about these additional sources on Enterprise Data Collector, please contact info@gnip.com for more information.

Commercial Evolution of Social Networks

Over the past four years Gnip has seen many social services come and go. Not surprisingly, a pattern has emerged in how they evolve, and the degree to which our customers need their public data. There are generally three distinct phases a social service goes through, and how the service does in each phase impacts how it ultimately participates in the broader public social data ecosystem which can complete a full commercial cycle. This cycle being one combining consumer use (often buying intent, or expression) with commercial engagement (identifying need in time of natural disaster, or ad buying).

Phase 1: Consumer Engagement
​A social service must engage us; the end-users/consumers. Whether via a homegrown social graph, or leveraging someone else’s (e.g. Facebook Connect), in order for a social service to become useful, it needs users. From there, those users need to participate in self-expression (from posting a comment, to retweeting a tweet) and generate activity on the service. There are a variety of ways to compel us users to engage in a social service, but the social service itself is solely responsible for the first experience. The vision of the services’ founders yields a web-app or mobile interface that allows us to take action, leveraging the expressions laid out by the app itself (e.g. sharing a photo). If users like the expressions, discovery methods, and sense of “connectedness,” you’ve got a relevant social service on your hands.

Phase 2: APIs; Outsourcing Engagement
At some point a successful social service realizes the potential for outsourcing the expression metaphors that make the service successful & useful, and they construct an API that allows others to RESTfully engage with the service. In some instances the API is read-only. In some instances the API is write-only; sometimes both. What is key is that nine times out of ten, the API is meant to drive core service engagement via other user-facing applications. A classic example of this would the zillions of non-Twitter Inc clients that “Tweet” on our behalves everyday. One look at the endless number of Tweet “sources” that flow through the Firehose and you’ll realize this engagement potential.

The exceptional API is one that has broader social data engagement ecosystem consumption in its DNA. Typical social services consider themselves the center of the universe, and that not only will they capture all consumer engagement, they will be the root of all broader ecosystem engagement as well. However, success with Consumer Engagement does not guarantee commercial engagement; not by a long-shot.

Some services execute phase 1 and 2 simultaneously these days.

Phase 3: Activity Transparency; Commercial Engagement
Allowing other applications & developers to inject activities into the core service is obviously valuable, however it is only part of the picture. Social services with broad social and commercial impact have achieved this by addressing commercial needs for complete, raw, activity availability. For example, in order for someone to deploy resources in a disaster relief scenario effectively, they need to make their own determination as to what victims need, where they are located, and general conditions surrounding the event. The social service limiting access to the activities taking place on the service, by definition, yields an incomplete picture to downstream commercial consumers of the content. The result is a fragmented & hobbled experience for commerce engagement.

Another key component to commercial engagement is realizing that the ecosystem of data analytics and insights is well established, complex, and interwoven. Massive investments have been made in the market over the years, and brands want to leverage that fact. It is illogical for a social service to address the endless needs of the enterprise by building their own tools. Attempts to supplement this market comes at the potential expense of losing focus on building a great consumer experience.

The most impactful, useful, and valuable social services that Gnip customers leverage for their needs (ad buying, campaign running, stock trading, disaster relief), are those that acknowledge that they are not an island in the ecosystem. They complete the cycle by providing unfettered access to one of their most significant assets. In trade, the relevance of the social service itself is maximized because commerce can engage with it.

A good example of how impactful this transparency can be is Twitter. Consider how Twitter is used across new, as well as traditional, media. They’ve completed the cycle with a strong offering of Phase 3.

All three phases are not required for success, but all three are indeed required for success in the broader public commercial social data ecosystem.

Google+ Now Available from Gnip

Gnip is excited to announce the addition of Google+ to its repertoire of social data sources. Built on top of the Google+ Search API, Gnip’s stream allows its customers to consume realtime social media data from Google’s fast-growing social networking service. Using Gnip’s stream, customers can poll Google+ for public posts and comments matching the terms and phrases relevant to their business and client needs.

Google+ is an emerging player in the social networking space that is a great pairing with the Twitter, Facebook, and other microblog content currently offered by Gnip. If you are looking for volume, Google+ quickly became the third largest social networking platform within a week of its public launch and some are projecting it to emerge as the world’s second largest social network within the next twelve months. Looking to consume content from social network influencers? Google+ is where they are! (even former Facebook President Sean Parker says so).

By working with Gnip along with a stream of Google+ data (and the availability of an abundance of other social data sources), you’ll have access to a normalized data format, unwound URLs, and data deduplication. Existing Gnip customers can seamlessly add Google+ to their Gnip Data Collectors (all you need is a Google API Key). New to Gnip? Let us help you design the right solution for your social data needs, contact sales@gnip.com.

Dreamforce Hackathon Winner: Enterprise Mood Monitor

As we wrote in our last post, Gnip co-sponsored the 2011 Dreamforce Hackathon, where teams of developers from all over the world competed for the top three overall cash prizes as well as prizes in multiple categories.  Our very own Rob Johnson (@robjohnson), VP of Product and Strategy, helped judge the entries, selecting the Enterprise Mood Monitor as winner of the Gnip category.

The Enterprise Mood Monitor pulls in data from a variety of social media sources, including the Gnip API, to provide realtime and historical information about the emotional health of the employees. It shows both individual and overall company emotional climate over time and can send SMS messages to a manager in cases when the mood level goes below a threshold. In addition, HR departments can use this data to get insights into employee morale and satisfaction over time, eliminating the need to conduct the standard employee satisfaction surveys. This mood analysis data can also be correlated with business metrics such as Sales and Support KPIs to identify drivers of business performance.

Pretty cool stuff.

The three developers (Shamil Arsunukayev , Ivan Melnikov  and Gaziz Tazhenov) from Comity Designs behind this idea set out to create a cloud app for the social enterprise built on one of Salesforce’s platforms.  They spent two days brainstorming the possibilities before diving into two days of rigorous coding. The result was the Enterprise Mood Monitor, built on the Force.com platform using Apex, Visualforce, and the following technologies: Facebook API (Graph API),  Twitter API, Twitter Sentiment API, LinkedIn API, Gnip API, Twilio, Chatter, Google Visualization API. The team entered their Enterprise Mood Monitor into the Twilio and Gnip categories. We would like to congratulate the guys on their “double-dip” win as they took third place overall and won the Gnip category prize!

Have fun and creative way you’ve used data from Gnip? Drop us an email or give us a call at 888.777.7405 and you could be featured in our next blog.

We're off to Dreamforce!

There’s always a lot going on here at Gnip, but this week is especially packed with the team looking to make a big splash at Salesforce.com’s annual Dreamforce event. Salesforce is obviously a huge player in the software space and the theme of this year’s Dreamforce is “Welcome to the Social Enterprise” which fits really nicely with what we do.

At the conference, we’ll be speaking at two sessions and sponsoring the Hack-a-thon. In the first presentation, Drinking from the Firehose: How Social Data is Changing Business Practices, Jud (@jvaleski) and Chris (@chrismoodycom) will discuss the ways that social data is being used to drive innovation across a variety of industries from Financial Services and Emergency Response to Local Business and Consumer Electronics. They’ll also give a glimpse into the technical challenges involved in handling the ever-increasing volume of data that’s flowing out of Twitter every day. If you’re at Dreamforce, this session is on Tuesday (8/30) from 11am to noon in the DevZone Theater on the 2nd floor of Moscone West.

In the second presentation, Your Guide to Understanding the Twitter API, Rob (@robjohnson) will talk through the best ways to get access to the Twitter data that you’re looking for, examining the pros and cons of the various methods. You can check out Rob’s session on Tuesday (8/30) from 3:00 to 3:30 in the Lightning Forum in the DevZone on the 2nd floor of Moscone West.

And finally, we’re sponsoring the Hack-a-thon where teams of developers will create cloud apps for the social enterprise using Twitter feeds from Gnip and at least one of the Salesforce platforms (Force.com, Heroku, Database.com). The winning team stands to take home at least $10,000 in prize money. We’re really excited to see the creative solutions that the teams develop! All submissions are due no later than 6am on Thursday (9/1), so sign up now and get going!

Want to meet up in person at Dreamforce? Give any of us a shout @jvaleski, @chrismoodycom, @robjohnson, @funkefred.

Customer Spotlight – Klout

Providing Klout Scores, a measurement of a user’s overall online influence, for every individual in the exponentially ever-growing base of Twitter users was the task at hand for Matthew Thomson, VP of Platform at Klout. With massive amounts of data flowing in by the second, Thomson and Klout’s scientists and engineers needed a fast and reliable solution for processing, filtering, and eliminating data from the Twitter Firehose that was unnecessary for calculating and assigning Twitter users’ Klout Scores

“Not only has Gnip helped us triple our API volume in less than one month but they provided us with a trusted social media data delivery platform necessary for efficiently scaling our offerings and keeping up with the ever-increasing volume of Twitter users.”

- Matthew Thomson
VP of Platform, Klout

By selecting Gnip as their trusted premium Twitter data delivery partner, Klout tripled their API volume and increased their ability to provide influence scores by 50 percent among Twitter users in less than one month.

Get the full detail, read the success story here.

Customer Spotlight – MutualMind

 
Like many startups seeking to enter and capitalize on the rising social media marketplace, timing is everything. MutualMind was no exception: getting their enterprise social media management product to market in a timely manner was crucial to the success of their business. MutualMind provides an enterprise social media intelligence and management system that monitors, analyzes, and promotes brands on social networks and helps increase social media ROI. The platform enables customers to listen to discussion on the social web, gauge sentiment, track competitors, identify and engage with influencers, and use resulting insights to improve their overall brand strategy.

“Through their social media API, Gnip helped us push our product to market six months ahead of schedule, enabling us to capitalize on the social media intelligence space. This allowed MutualMind to focus on the core value it adds by providing advanced analytics, seamless engagement, and enterprise-grade social management capabilities.”

- Babar Bhatti
CEO, MutualMind

By selecting Gnip as their data delivery partner, MutualMind was able to get their product to market six months ahead of schedule. Today, MutualMind processes tens of millions of data activities per month using multiple sources from Gnip including premium Twitter data, YouTube, Flickr, and more.
 
Get the full detail, read the success story here.

Guide to the Twitter API – Part 3 of 3: An Overview of Twitter’s Streaming API

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…