Profile Geo: When You Need More Geodata In Your Twitter Data

Sometimes in the world of social data it is hard to grasp the amazing possibilities when we use words to describe things. The old adage that a picture is worth a thousand words is true, so we wanted to show you what our new Profile Geo enrichment does.

First, here is what Profile Geo is:
Gnip’s Profile Geo enrichment significantly increases the amount of usable geodata for Twitter. It normalizes unstructured location data from Twitter users’ bio locations and matches those latitude/longitude coordinates for those normalized places. For examples, everyone who mentions “NYC,” New York City,” “Manhattan,” and even some odd instances like “NYC Baby✌” all get normalized to “New York City, New York, United States” so they’re easy to map.

Now, here is what Profile Geo does in practice for users interested in Twitter geodata:
Football Geo

We think this is really powerful stuff. These maps were created using 2 sets of Tweets taken over 3 Sundays where we were looking for Tweets containing the term “football.” The map for Standard Geo is comprised of Tweets where users specifically geotagged their Tweet with their latitude and longitude (natively in the Twitter payload). The map for Profile Geo is comprised of Tweets where Gnip was able to enrich additional Tweets and assign the Tweet to a latitude and longitude.

As you can see the amount of location data available through Profile Geo is significantly higher than through Standard Geo. To be specific, we did our “football” search using the Decahose, a random sampling of 10% of the full Twitter firehose. Standard Geo returned just under 3,000 Tweets, while the Profile Geo search returned more than 40,000 Tweets! (Multiply those by 10 to get approximations of firehose volumes) With this additional geodata the possibilities are limitless. The NFL can understand the demographics of their demand better, football clubs in the UK can see how far their reach is, TV networks can use this data to tailor media, among infinite other uses.

If you were to remove the search for “football” and use the entire firehose of Twitter data you’d find that you can receive roughly 15 times the amount of geo-relevant data by using Gnip’s Profile Geo enrichment instead of just the geodata in the standard stream. Anyone using geodata in their social data analyses should find great value in this dramatic increase in georelevant data.

If images are better than words, then interactive maps are better than images. Here are the maps so you can play around and see the difference yourself. Zooming in will depict just how much more data is available with Profile Geo in clear detail:
Continue reading

Data Story: Phil Harris of Geofeedia

Data Stories is Gnip’s ongoing series telling the stories of the people and companies that are doing groundbreaking work in social data. This week we’re interviewing Phil Harris, CEO of Geofeedia, a company that allows you to search and monitor social media by location. Geofeedia is a recent Gnip customer, and I love what they’re doing. The inherent value of Geofeedia was made clear to me when we received a media request looking for all social media that was geotagged close to the finish line of the Boston Marathon. Content + location creates powerful stories and Geofeedia is making it easier to find the right ones. 

1. What social data sources do you wish had geotagged data?
Our business is built on the fundamental premise of open source social data aggregation.  Or, I should say, every source. That said, there are currently major social data sources that provide public location data based on location identifier versus geotag. We will accommodate location id to integrate these data sources, but I strongly believe that over time, the benefits of more precise geo-location tagging on social media content will encourage these services to move towards geotagging. When they do, we’re exceptionally well positioned to translate that evolution into benefit for our clients.

2. If you’re a user, what do you think is the advantage of sharing your geodata?
We’ve barely scratched the surface of how geodata will deliver value to consumers. I believe the rapidly growing penetration of smartphones and adoption of geo-centric applications such as navigation will create a rich ecosystem of geo-data driven benefits. I am speaking with major consumer brands who believe that they will be able to create and maintain consumer relationships via location based social media in ways that will deliver significant value back to the individual user.

3. What can you find with Geofeedia that you can’t find on other platforms?
I know from analyzing our data with active customers that a significant amount of user generated content is missed by traditional keyword or hashtag centric monitoring tools. We complement these platforms to ensure relevant location based content is delivered to our customers in real-time.

4. Only a small portion of social media is geotagged, do you think this will change in the future?
I do. We’re seeing an increase every quarter, but as brands start rolling out compelling reasons for consumers to geotag their content, I believe geotagged social media will become the default.

5. How do you think Geofeedia will be used for good?
The leading businesses I’m speaking with consider Geofeedia as a tool to improve their overall customer experience. Understanding an individual social media conversation at a moment in time at a given location drastically improves the ways brands can serve their customers. Also, numerous public safety agencies are using Geofeedia to improve their ability to respond to natural disasters and other scenarios where real-time, location based social media awareness delivers great value.

6. How will real-time geo monitoring affect a brand’s ability to connect with their customers?

Like I said, the major brands with whom I’m speaking are evaluating how to improve their overall customer experience across all touch points – sales, customer service, loyalty – through real-time location based monitoring, analysis and engagement. I do believe that real-time, location based social media engagement will drastically improve a brand’s ability to have a meaningful, new type of relationship with their customers and become a de facto element of their communication mix.

Guide to the Twitter API – Part 2 of 3: An Overview of Twitter’s Search API

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…