Get More Twitter Geodata From Gnip With Our New Profile Geo Enrichment

Twitter Map - Giant Fans in the US Tweeting from the Stadium

When it comes to analyzing social data, “where” matters. After the topics of conversations, perhaps the strongest connection between social conversations online and the offline world is location. Location is an implicit part of what we do, who we know, what we need, etc. For years now at Gnip, the most requested feature for our existing data products has been “more geodata” to help our customers understand the offline locations that are relevant to online conversations. Today we’re pleased to announce a major step toward meeting that demand: the public beta launch of our new Profile Geo enrichment.

The Profile Geo enrichment is simple. Location data is provided publicly by millions of users in their profiles on social networks, but it’s rarely delivered in a normalized format with consistent latitude/longitude coordinates that are necessary for software to ingest the data and make use of it. The Profile Geo enrichment from Gnip normalizes this data to common geographies (for instance, “NYC,” “Manhattan,” etc. all map to “New York City, NY, US”) and provides latitude/longitude coordinates for those places so it’s easy to plot social data on a map.

Our customers are hungry to analyze Twitter through a geographic lens. As a brand, it can be great to know that people are talking about my brand and products online, but few things make those conversations more actionable than knowing where those conversations are taking place. Do we need to change our marketing campaign in a region? Focus on improving customer service? As government or civil society organizations responding to crises, location is the key to identifying need in an actionable way and then deploying resources effectively. It may be obvious we need clean water and blankets, but where is the most important place to send them?

For this new enrichment, we started with Twitter because it offers the biggest initial gain for our customers. While less than 2% of Tweets in the Twitter Firehose contain latitude/longitude coordinates for Twitter’s “geotagged” Tweets, more than half of all Tweets contain a profile location value from a user. And while just 1% of users generate approximately two-thirds of all geotagged Tweets (according to this helpful paper from our friend Kalev Leetaru and his colleagues), profile location data is much more evenly distributed. In that way, looking at profile location data “democratizes” the data that appear when mapping Twitter content – our customers can now hear from the whole world of Twitter users and not just this 1%.

This new premium enrichment from Gnip provides several key benefits for social data analysis. First, it increases the amount of usable Twitter geodata available for analysis by more than 15x for Twitter. Second, it adds a new kind of Twitter geodata from what may be natively available from social sources. It’s important to think about the three different types of location that exist in social media to understand this benefit.

  • Activity Location: Where the activity (Tweet, Check-in, etc.) directly came from, via GPS signal on a user’s device or association with a known venue location. This is the kind of location that provides latitude/longitude natively in Twitter’s or Foursquare’s firehoses.

  • Profile Location: The place the user provides as their location in their profile. They may or may not be there when posting to a social network.

  • Mentioned Locations: Places the user talks about in a post or check-in. These places may not have anything to do with where the person lives or where the person is when posting, e.g. “I can’t wait for Gnip to open its new office in the Maldives.” (The Maldives in this case might as well be a fictitious place considering the likelihood that will happen.)

Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator of activity location when one isn’t provided.

To get a sense of the impact of the Profile Geo enrichment in practice, we worked with the team at MapBox again to create a map of Tweets about the San Francisco Giants over the past few weeks (PS: check out the other maps we made together if you haven’t seen them). During that time period, over two thousand Tweets occurred at AT&T Park that were geotagged with the activity location. With the addition of the Profile Geo enrichment for the same Tweets, it’s now possible to quickly create a map that shows the relationship between activity location (all in the Park), and profile location – where those people came from to watch the game. Next time the Giants franchise wants to think about tourist attendance numbers, they’ll have a new way to do so. Check it out.

SF Giants Tweets from the stadium (center point of the orange lines) link to the profile locations of those users around the globe, showing how far they traveled. Click on the “USA” toggle to see the whole world. Hover over states/countries to see total counts.

The Profile Geo enrichment is now available to all Gnip customers as an option on their Twitter data products in this beta release. We’re looking forward to seeing how this enrichment changes what can be done with location and social data.

If you’re interested in learning more, please visit or hit us up at

Data Story: Phil Harris of Geofeedia

Data Stories is Gnip’s ongoing series telling the stories of the people and companies that are doing groundbreaking work in social data. This week we’re interviewing Phil Harris, CEO of Geofeedia, a company that allows you to search and monitor social media by location. Geofeedia is a recent Gnip customer, and I love what they’re doing. The inherent value of Geofeedia was made clear to me when we received a media request looking for all social media that was geotagged close to the finish line of the Boston Marathon. Content + location creates powerful stories and Geofeedia is making it easier to find the right ones. 

1. What social data sources do you wish had geotagged data?
Our business is built on the fundamental premise of open source social data aggregation.  Or, I should say, every source. That said, there are currently major social data sources that provide public location data based on location identifier versus geotag. We will accommodate location id to integrate these data sources, but I strongly believe that over time, the benefits of more precise geo-location tagging on social media content will encourage these services to move towards geotagging. When they do, we’re exceptionally well positioned to translate that evolution into benefit for our clients.

2. If you’re a user, what do you think is the advantage of sharing your geodata?
We’ve barely scratched the surface of how geodata will deliver value to consumers. I believe the rapidly growing penetration of smartphones and adoption of geo-centric applications such as navigation will create a rich ecosystem of geo-data driven benefits. I am speaking with major consumer brands who believe that they will be able to create and maintain consumer relationships via location based social media in ways that will deliver significant value back to the individual user.

3. What can you find with Geofeedia that you can’t find on other platforms?
I know from analyzing our data with active customers that a significant amount of user generated content is missed by traditional keyword or hashtag centric monitoring tools. We complement these platforms to ensure relevant location based content is delivered to our customers in real-time.

4. Only a small portion of social media is geotagged, do you think this will change in the future?
I do. We’re seeing an increase every quarter, but as brands start rolling out compelling reasons for consumers to geotag their content, I believe geotagged social media will become the default.

5. How do you think Geofeedia will be used for good?
The leading businesses I’m speaking with consider Geofeedia as a tool to improve their overall customer experience. Understanding an individual social media conversation at a moment in time at a given location drastically improves the ways brands can serve their customers. Also, numerous public safety agencies are using Geofeedia to improve their ability to respond to natural disasters and other scenarios where real-time, location based social media awareness delivers great value.

6. How will real-time geo monitoring affect a brand’s ability to connect with their customers?

Like I said, the major brands with whom I’m speaking are evaluating how to improve their overall customer experience across all touch points – sales, customer service, loyalty – through real-time location based monitoring, analysis and engagement. I do believe that real-time, location based social media engagement will drastically improve a brand’s ability to have a meaningful, new type of relationship with their customers and become a de facto element of their communication mix.

Geosocial Data: Patterns of Everyday Life

My love for checking in and thus, geolocation, began after SXSW of 2009 while I racked up points and worked hard to become the leader of Boulder, ultimately losing to Eric Wu. Since then, my views on geolocation have evolved, and I have become especially enamored with the way geosocial data allows us to leave trails of the lives we and others are living. At its best, geolocation + social connects us to friends we are close to by letting us know who is near and collectively, social data can identify common interests and patterns of behavior we couldn’t see in the past.

Since 2008, Foursquare has evolved becoming a service with 50 million users and two billion check-ins and a facelift launching tomorrow, Twitter has opened up a geolocation API, Facebook Places launched and continues to evolve, Highlight launched and Gowalla was acquired by Facebook. All of these advancements have happened in a couple of short years. Geotagging allows these new crop of social networks to add your geographic location via metadata and now you can add location to tweets, photos, videos, etc.

Patterns of My Life

Every time I check in and share my location, I start leaving a trail of my day-to-day life. This trail, at its most basic, serves as a virtual diary of where I went and with whom. Timehop emails me each day to tell me what I did a year ago, while services such as Rewind.Me allow me to search my patterns and how I stack up against others.

Tripmeter lets me see my virtual trail and the how I travel throughout the day based on Foursquare and Facebook checkins, similar to what Route does. Where Do You Go even lets you heatmap where you most often visit (hint: I hate South Boulder).

Foursquare Heat Map

Checkins Are a Moving Census

But collectively, the patterns woven by geosocial data are incredibly telling and act as a living census. Intriguingly, researchers from Carnegie Mellon have created what they call “Livehoods” which are neighborhoods defined on not only on geographic proximity, but also based on social geotagged data. Essentially, the similarities are based on where people check in. While the data only includes those using geolocation, it shows that people who check into a local restaurant and a similar bar create cultural neighborhoods. This data is more than just an intellectual curiosity. Companies can analyze customer patterns to focus marketing efforts, identify companies to partner with and determine new brick-and-mortar locations.

Example of Livehood Data

I particularly love the idea of an app using Foursquare data called “When Should I Visit?” that tells you when is a good time to visit London tourist attractions based on Foursquare checkins. Other use cases for this type of social data could tell people when to visit high-traffic destinations such as the DMV. I love knowing when not to be somewhere as much as knowing what locations and parties are trending.

HealthMaps uses geosocial data and news reports to help track epidemics as they pop up. The mapping system was created by a team of researchers, epidemiologists and software developers from Children’s Hospital Boulder to monitor real-time epidemics as they break out. Rumi Chunara, worked on this project and also helped use geosocial data to track how cholera spread in Haiti. (Rumi will be speaking at Gnip’s social data conference, Big Boulder, about social data in public service.) Geosocial data has unlimited uses in the cases of health epidemics and natural disasters.

Companies are starting to create passive geolocation checkins such as EpicMix from Vail Resorts, which enables skiers to automatically check in using the RFID tags on their ski lifts. The system tells users how much they skied, where they skied, their vertical ascents and where their friends are on the mountain. During the last Coachella, 30,000 concertgoers used RFID bands from Intellix to checkin and update their Facebook status on various portals spaced throughout concert grounds. Near field communication is another way social data provides amazing patterns.

Geosocial data allows us insight into the patterns of everyday people, and the applications for this are endless.