Get More Twitter Geodata From Gnip With Our New Profile Geo Enrichment

Twitter Map - Giant Fans in the US Tweeting from the Stadium

When it comes to analyzing social data, “where” matters. After the topics of conversations, perhaps the strongest connection between social conversations online and the offline world is location. Location is an implicit part of what we do, who we know, what we need, etc. For years now at Gnip, the most requested feature for our existing data products has been “more geodata” to help our customers understand the offline locations that are relevant to online conversations. Today we’re pleased to announce a major step toward meeting that demand: the public beta launch of our new Profile Geo enrichment.

The Profile Geo enrichment is simple. Location data is provided publicly by millions of users in their profiles on social networks, but it’s rarely delivered in a normalized format with consistent latitude/longitude coordinates that are necessary for software to ingest the data and make use of it. The Profile Geo enrichment from Gnip normalizes this data to common geographies (for instance, “NYC,” “Manhattan,” etc. all map to “New York City, NY, US”) and provides latitude/longitude coordinates for those places so it’s easy to plot social data on a map.

Our customers are hungry to analyze Twitter through a geographic lens. As a brand, it can be great to know that people are talking about my brand and products online, but few things make those conversations more actionable than knowing where those conversations are taking place. Do we need to change our marketing campaign in a region? Focus on improving customer service? As government or civil society organizations responding to crises, location is the key to identifying need in an actionable way and then deploying resources effectively. It may be obvious we need clean water and blankets, but where is the most important place to send them?

For this new enrichment, we started with Twitter because it offers the biggest initial gain for our customers. While less than 2% of Tweets in the Twitter Firehose contain latitude/longitude coordinates for Twitter’s “geotagged” Tweets, more than half of all Tweets contain a profile location value from a user. And while just 1% of users generate approximately two-thirds of all geotagged Tweets (according to this helpful paper from our friend Kalev Leetaru and his colleagues), profile location data is much more evenly distributed. In that way, looking at profile location data “democratizes” the data that appear when mapping Twitter content – our customers can now hear from the whole world of Twitter users and not just this 1%.

This new premium enrichment from Gnip provides several key benefits for social data analysis. First, it increases the amount of usable Twitter geodata available for analysis by more than 15x for Twitter. Second, it adds a new kind of Twitter geodata from what may be natively available from social sources. It’s important to think about the three different types of location that exist in social media to understand this benefit.

  • Activity Location: Where the activity (Tweet, Check-in, etc.) directly came from, via GPS signal on a user’s device or association with a known venue location. This is the kind of location that provides latitude/longitude natively in Twitter’s or Foursquare’s firehoses.

  • Profile Location: The place the user provides as their location in their profile. They may or may not be there when posting to a social network.

  • Mentioned Locations: Places the user talks about in a post or check-in. These places may not have anything to do with where the person lives or where the person is when posting, e.g. “I can’t wait for Gnip to open its new office in the Maldives.” (The Maldives in this case might as well be a fictitious place considering the likelihood that will happen.)

Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator of activity location when one isn’t provided.

To get a sense of the impact of the Profile Geo enrichment in practice, we worked with the team at MapBox again to create a map of Tweets about the San Francisco Giants over the past few weeks (PS: check out the other maps we made together if you haven’t seen them). During that time period, over two thousand Tweets occurred at AT&T Park that were geotagged with the activity location. With the addition of the Profile Geo enrichment for the same Tweets, it’s now possible to quickly create a map that shows the relationship between activity location (all in the Park), and profile location – where those people came from to watch the game. Next time the Giants franchise wants to think about tourist attendance numbers, they’ll have a new way to do so. Check it out.

SF Giants Tweets from the stadium (center point of the orange lines) link to the profile locations of those users around the globe, showing how far they traveled. Click on the “USA” toggle to see the whole world. Hover over states/countries to see total counts.

The Profile Geo enrichment is now available to all Gnip customers as an option on their Twitter data products in this beta release. We’re looking forward to seeing how this enrichment changes what can be done with location and social data.

If you’re interested in learning more, please visit gnip.com/enrichments or hit us up at info@gnip.com.

Data Stories: Interview with Data Scientist Blake Shaw of Foursquare

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Blake Shaw of Foursquare about how data science is not only shaping Foursquare and its recommendations, but how Foursquare can be a “microscope for cities.” You can follow Blake on Twitter at @metablake and check out Foursquare’s blog for more data science. 

Data Scientist Blake Shaw of Foursquare

1. Your team has found a correlation between warm days and ice cream consumption in NYC. At some point, do you envision Foursquare being able to trigger offers based on different correlations your data science has found?

Yes!  In fact, we currently trigger recommendations (which often contain deals and offers) based on a ton of different contextual signals that the team here has identified as useful.  These signals include where you are, the places you like to go, the time of the day, the preferences of your friends, and what is popular around you. Mapping all of these signals to good recommendations requires finding correlations in massive amounts of data.  Some of these correlations are simple like when it’s the morning people like to get coffee, and some correlations are more complex like when it’s cold out in New York, people are more likely to go to ramen and noodle shops.

2. One of my favorite features of the Explore feature is that Foursquare lets you know when you check into a city locations where both locals and out-of-towners like to go. How does data science and product work together to make recommendations such as these?

Tourist recommendations is definitely one of my favorite features of Explore as well. In general, there is a healthy mix of product-driven and data-driven development at Foursquare. We will often work together to brainstorm not only what would be best to build from a product perspective but also what data we should be investigating further. Tourist recommendations came from the data; we realized that it would be easy to identify places that had a statistically high proportion of tourists and surface them to Explore users who find themselves in unfamiliar areas.  The results are fantastic — it’s like having millions of people creating a travel guide, just by walking around a city and checking in.

3. Foursquare got its start in NYC. What are interesting observations you’ve seen on how people use Foursquare in smaller cities such as Boulder and Denver?

I feel like Foursquare is more of a necessity in big cities like New York, where new places are opening all the time and it’s hard to keep track of them all.  That said, we see strong usage in places like Boulder and Denver as well. As expected, users in smaller cities such as these are more interested in old favorites rather than exploring new places.

4. What signals does Foursquare use to recommend places to people?

I can’t reveal all of the signals we use to rank places, but we believe that place recommendation should be highly personalized, so we heavily weight signals about your tastes and the tastes of your friends.  We also think that from all of this data about where people are going we can discern which are the best places.  Imagine being able to ask everyone who has been to a restaurant if they would go back. We believe that by measuring signals about places such as loyalty, expertise, and sentiment we can tease out the best places. This is the idea behind our recently launched Foursquare ratings.  People are voting with their feet in the real world, not simply leaving a star or a like on a website.

5. Do you see a correlation between Foursquare sharing check-ins and badges on other social sites and increased usage of Foursquare? For example, if someone chooses to share a checkin on Twitter or Facebook, does that increase the likelihood of other people checking in?

Yes we do. Roughly a quarter of all check-ins are shared to wider audiences on Twitter and Facebook.  These in turn help spread awareness and adoption of Foursquare.

6. Foursquare recently showed a visualization of how check-ins in NYC were affected by hurricane Sandy. How else do you see check-in data being useful other than for powering your recommendation engine?

Visualization of Foursquare Checkins Before and After Hurricane Sandy

One of my favorite aspects of working at Foursquare is getting to study this data from a larger sociological perspective. We are capturing this amazing signal about what millions of people are doing in the real world at every moment of the day in cities all around the globe. We have seen that when we aggregate check-in patterns across many individuals, we can measure features of cities at a higher resolution than was ever possible before.  I think this data can act almost like a “microscope for cities.”  If you look at how the storm affected NYC, you can see how this incredibly powerful force disrupted the natural rhythm of the city. It’s striking how predictable these patterns are, and how precisely we can identify unusual events. For example, in this plot we see how check-ins at grocery stores went up more than 200% in the days before the storm.  I see this real-time pulse or “EKG” of a city being a valuable resource in the future for understanding cities, giving us a larger view of the collective movement patterns of millions of people.

Continue reading

SGI Launches Global Twitter Heartbeat, Powered by Gnip

File this under cool news.

SGI’s Big Brain Computer has created a Global Twitter Heartbeat, allowing the supercomputer to analyze the Twitter stream for sentiment and geolocation to create a Twitter heartbeat telling us how the world is feeling based on emotions communicated via Twitter. Not only is this a cool undertaking by the folks at SGI, but we’re proud to announce that it is powered by Gnip’s decahose Twitter stream.

To make this happen, SGI partnered with Kalev H. Leetaru of the University of Illinois and Dr. Shaowen Wang of the CyberInfrastructure and Geospatial Information (CIGI) Laboratory at the University of Illinois at Urbana-Champaign.

This isn’t just some simple stream.  The SGI supercomputer analyzes every Tweet to assign location (not just GPS-tagged tweets, but processing the text of the Tweet itself) and tone values, then visualizing the conversation in a heat map that puts Tweet location, Tweet density and tone into a unified geospatial perspective. The entire process from ingestion to data analysis to producing the heat map runs at a speed that allows visualization of a map frame per second.

To see it live, check out SGI’s Facebook page.

You can also see videos of the Twitter Heartbeat for the Presidential Elections and Hurricane Sandy.