Get More Twitter Geodata From Gnip With Our New Profile Geo Enrichment

Twitter Map - Giant Fans in the US Tweeting from the Stadium

When it comes to analyzing social data, “where” matters. After the topics of conversations, perhaps the strongest connection between social conversations online and the offline world is location. Location is an implicit part of what we do, who we know, what we need, etc. For years now at Gnip, the most requested feature for our existing data products has been “more geodata” to help our customers understand the offline locations that are relevant to online conversations. Today we’re pleased to announce a major step toward meeting that demand: the public beta launch of our new Profile Geo enrichment.

The Profile Geo enrichment is simple. Location data is provided publicly by millions of users in their profiles on social networks, but it’s rarely delivered in a normalized format with consistent latitude/longitude coordinates that are necessary for software to ingest the data and make use of it. The Profile Geo enrichment from Gnip normalizes this data to common geographies (for instance, “NYC,” “Manhattan,” etc. all map to “New York City, NY, US”) and provides latitude/longitude coordinates for those places so it’s easy to plot social data on a map.

Our customers are hungry to analyze Twitter through a geographic lens. As a brand, it can be great to know that people are talking about my brand and products online, but few things make those conversations more actionable than knowing where those conversations are taking place. Do we need to change our marketing campaign in a region? Focus on improving customer service? As government or civil society organizations responding to crises, location is the key to identifying need in an actionable way and then deploying resources effectively. It may be obvious we need clean water and blankets, but where is the most important place to send them?

For this new enrichment, we started with Twitter because it offers the biggest initial gain for our customers. While less than 2% of Tweets in the Twitter Firehose contain latitude/longitude coordinates for Twitter’s “geotagged” Tweets, more than half of all Tweets contain a profile location value from a user. And while just 1% of users generate approximately two-thirds of all geotagged Tweets (according to this helpful paper from our friend Kalev Leetaru and his colleagues), profile location data is much more evenly distributed. In that way, looking at profile location data “democratizes” the data that appear when mapping Twitter content – our customers can now hear from the whole world of Twitter users and not just this 1%.

This new premium enrichment from Gnip provides several key benefits for social data analysis. First, it increases the amount of usable Twitter geodata available for analysis by more than 15x for Twitter. Second, it adds a new kind of Twitter geodata from what may be natively available from social sources. It’s important to think about the three different types of location that exist in social media to understand this benefit.

  • Activity Location: Where the activity (Tweet, Check-in, etc.) directly came from, via GPS signal on a user’s device or association with a known venue location. This is the kind of location that provides latitude/longitude natively in Twitter’s or Foursquare’s firehoses.

  • Profile Location: The place the user provides as their location in their profile. They may or may not be there when posting to a social network.

  • Mentioned Locations: Places the user talks about in a post or check-in. These places may not have anything to do with where the person lives or where the person is when posting, e.g. “I can’t wait for Gnip to open its new office in the Maldives.” (The Maldives in this case might as well be a fictitious place considering the likelihood that will happen.)

Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator of activity location when one isn’t provided.

To get a sense of the impact of the Profile Geo enrichment in practice, we worked with the team at MapBox again to create a map of Tweets about the San Francisco Giants over the past few weeks (PS: check out the other maps we made together if you haven’t seen them). During that time period, over two thousand Tweets occurred at AT&T Park that were geotagged with the activity location. With the addition of the Profile Geo enrichment for the same Tweets, it’s now possible to quickly create a map that shows the relationship between activity location (all in the Park), and profile location – where those people came from to watch the game. Next time the Giants franchise wants to think about tourist attendance numbers, they’ll have a new way to do so. Check it out.

SF Giants Tweets from the stadium (center point of the orange lines) link to the profile locations of those users around the globe, showing how far they traveled. Click on the “USA” toggle to see the whole world. Hover over states/countries to see total counts.

The Profile Geo enrichment is now available to all Gnip customers as an option on their Twitter data products in this beta release. We’re looking forward to seeing how this enrichment changes what can be done with location and social data.

If you’re interested in learning more, please visit gnip.com/enrichments or hit us up at info@gnip.com.

Data Story: Oliver O'Brien on Open Data Maps

I stumbled across the most amazing set of open data maps for bike sharing cities and tracked down the creator in London to interview him for a Data Story. Oliver O’Brien is the creator of the maps, which tracks available bikes and open spaces at bike sharing stations at more than 100 cities across the world. We interviewed him about his work with open maps and his research trying to understand how people move about the city. 

Ollie O'Brien Open Maps

1. What was the genesis for creating the maps?
It started from seeing the launch of London’s system in August 2010. It was at a time when I was working with Transport for London data on a project called MapTube. Transport for London had recently created a Developer portal for their datasets. When the London bikeshare launched, their map was not great (and still isn’t) – it was just a mass of white icons – so I took advantage of the data being provided on the Developer portal to create my own version, reusing some web code from an earlier map that showed General Election voting results in a fairer and clearer way. Once London’s was created, it proved to be a hit with people, as it could be used to see areas were bikes (or free spaces) might be in short supply. I was easily able to extend the map to Montreal and Minneapolis (the latter thanks to an enthusiastic local there) and then realised there was a whole world of bikesharing systems out there waiting to be mapped.

The maps act primarily as a “front-end” to the bikesharing data that I collect, for current and potential future research into the geomorphology of cities and their changing demographics and travel patterns, based on how the population uses bikesharing systems. However i have continued to update the map as it has remained popular, adding cities whenever I discover their bikeshare datasets. After three years, I am now up to exactly 100 “live” cities, where the data is fresh to within a few minutes, plus around 50 where the data is no longer available.

2. Where did you get the information to build the maps?
Mainly from APIs provided by each city authority or bikesharing operating company, or, where this is not available (which is often the case for smaller system) from their Google Map or other online mapping page that normally has the information in the HTML.

3. What is your background?
I’m an academic researcher and software developer at UCL’s Centre for Advanced Spatial Analysis. The lab specialises in urban modelling, and my current main project, EUNOIA, is aiming to build a travel mobility model, using social media as well as transport datasets, for the major European cities of London, Barcelona and Zurich. Bikesharing systems will form a key part of the overall travel model. Previously to CASA I worked as a financial GUI technologist at one of the big City banks – before then, at university, I studied Physics.

4. What are you looking to build next?
I am looking to continue to add cities to the global map, particularly from large bikesharing systems that are appearing – I am looking forward to the San Francisco Bay Area’s system launching in August – and I’m working on creating London’s EUNOIA model, taking in the transport data and augmenting it with other geospatial information, including data from Twitter. I am also looking at more effective ways to visualise data and statistics that are emerging from the recent (2011) Census that we had in the UK – the results of which are being gradually made available.

5. What open-source maps do you think should be created next?
I am hopeful that soon, an integrated map of all social media and sensor datasets, will become easily available and widely used. Partly to increase people’s awareness of the data that now surrounds them and partly to inform decision makers and other stakeholders, in creating a better, more inclusive city landscape – the so called “smart city”.

I would add that you may be interested in some of the other maps that we have created at UCL CASA, such as the Twitter Languages maps for London and New York:
http://twitter.mappinglondon.co.uk/ and http://ny.spatial.ly/ …and also http://life.mappinglondon.co.uk/ - these maps were all created mainly by my colleagues, with me just helping with the web work.

Boulder Bike Sharing

 Bike sharing map in Boulder, CO

Thanks to Oliver for the interview! If you’re interested in more geo + social, check out our recent posts on Social Data Mashups Following Natural Disasters and Mapping Travel, Languages & Mobile OS Usage with Twitter Data.

Data Story: Phil Harris of Geofeedia

Data Stories is Gnip’s ongoing series telling the stories of the people and companies that are doing groundbreaking work in social data. This week we’re interviewing Phil Harris, CEO of Geofeedia, a company that allows you to search and monitor social media by location. Geofeedia is a recent Gnip customer, and I love what they’re doing. The inherent value of Geofeedia was made clear to me when we received a media request looking for all social media that was geotagged close to the finish line of the Boston Marathon. Content + location creates powerful stories and Geofeedia is making it easier to find the right ones. 

1. What social data sources do you wish had geotagged data?
Our business is built on the fundamental premise of open source social data aggregation.  Or, I should say, every source. That said, there are currently major social data sources that provide public location data based on location identifier versus geotag. We will accommodate location id to integrate these data sources, but I strongly believe that over time, the benefits of more precise geo-location tagging on social media content will encourage these services to move towards geotagging. When they do, we’re exceptionally well positioned to translate that evolution into benefit for our clients.

2. If you’re a user, what do you think is the advantage of sharing your geodata?
We’ve barely scratched the surface of how geodata will deliver value to consumers. I believe the rapidly growing penetration of smartphones and adoption of geo-centric applications such as navigation will create a rich ecosystem of geo-data driven benefits. I am speaking with major consumer brands who believe that they will be able to create and maintain consumer relationships via location based social media in ways that will deliver significant value back to the individual user.

3. What can you find with Geofeedia that you can’t find on other platforms?
I know from analyzing our data with active customers that a significant amount of user generated content is missed by traditional keyword or hashtag centric monitoring tools. We complement these platforms to ensure relevant location based content is delivered to our customers in real-time.

4. Only a small portion of social media is geotagged, do you think this will change in the future?
I do. We’re seeing an increase every quarter, but as brands start rolling out compelling reasons for consumers to geotag their content, I believe geotagged social media will become the default.

5. How do you think Geofeedia will be used for good?
The leading businesses I’m speaking with consider Geofeedia as a tool to improve their overall customer experience. Understanding an individual social media conversation at a moment in time at a given location drastically improves the ways brands can serve their customers. Also, numerous public safety agencies are using Geofeedia to improve their ability to respond to natural disasters and other scenarios where real-time, location based social media awareness delivers great value.

6. How will real-time geo monitoring affect a brand’s ability to connect with their customers?

Like I said, the major brands with whom I’m speaking are evaluating how to improve their overall customer experience across all touch points – sales, customer service, loyalty – through real-time location based monitoring, analysis and engagement. I do believe that real-time, location based social media engagement will drastically improve a brand’s ability to have a meaningful, new type of relationship with their customers and become a de facto element of their communication mix.

SGI Launches Global Twitter Heartbeat, Powered by Gnip

File this under cool news.

SGI’s Big Brain Computer has created a Global Twitter Heartbeat, allowing the supercomputer to analyze the Twitter stream for sentiment and geolocation to create a Twitter heartbeat telling us how the world is feeling based on emotions communicated via Twitter. Not only is this a cool undertaking by the folks at SGI, but we’re proud to announce that it is powered by Gnip’s decahose Twitter stream.

To make this happen, SGI partnered with Kalev H. Leetaru of the University of Illinois and Dr. Shaowen Wang of the CyberInfrastructure and Geospatial Information (CIGI) Laboratory at the University of Illinois at Urbana-Champaign.

This isn’t just some simple stream.  The SGI supercomputer analyzes every Tweet to assign location (not just GPS-tagged tweets, but processing the text of the Tweet itself) and tone values, then visualizing the conversation in a heat map that puts Tweet location, Tweet density and tone into a unified geospatial perspective. The entire process from ingestion to data analysis to producing the heat map runs at a speed that allows visualization of a map frame per second.

To see it live, check out SGI’s Facebook page.

You can also see videos of the Twitter Heartbeat for the Presidential Elections and Hurricane Sandy.

Geosocial Data: Patterns of Everyday Life

My love for checking in and thus, geolocation, began after SXSW of 2009 while I racked up points and worked hard to become the leader of Boulder, ultimately losing to Eric Wu. Since then, my views on geolocation have evolved, and I have become especially enamored with the way geosocial data allows us to leave trails of the lives we and others are living. At its best, geolocation + social connects us to friends we are close to by letting us know who is near and collectively, social data can identify common interests and patterns of behavior we couldn’t see in the past.

Since 2008, Foursquare has evolved becoming a service with 50 million users and two billion check-ins and a facelift launching tomorrow, Twitter has opened up a geolocation API, Facebook Places launched and continues to evolve, Highlight launched and Gowalla was acquired by Facebook. All of these advancements have happened in a couple of short years. Geotagging allows these new crop of social networks to add your geographic location via metadata and now you can add location to tweets, photos, videos, etc.

Patterns of My Life

Every time I check in and share my location, I start leaving a trail of my day-to-day life. This trail, at its most basic, serves as a virtual diary of where I went and with whom. Timehop emails me each day to tell me what I did a year ago, while services such as Rewind.Me allow me to search my patterns and how I stack up against others.

Tripmeter lets me see my virtual trail and the how I travel throughout the day based on Foursquare and Facebook checkins, similar to what Route does. Where Do You Go even lets you heatmap where you most often visit (hint: I hate South Boulder).

Foursquare Heat Map

Checkins Are a Moving Census

But collectively, the patterns woven by geosocial data are incredibly telling and act as a living census. Intriguingly, researchers from Carnegie Mellon have created what they call “Livehoods” which are neighborhoods defined on not only on geographic proximity, but also based on social geotagged data. Essentially, the similarities are based on where people check in. While the data only includes those using geolocation, it shows that people who check into a local restaurant and a similar bar create cultural neighborhoods. This data is more than just an intellectual curiosity. Companies can analyze customer patterns to focus marketing efforts, identify companies to partner with and determine new brick-and-mortar locations.

Example of Livehood Data

I particularly love the idea of an app using Foursquare data called “When Should I Visit?” that tells you when is a good time to visit London tourist attractions based on Foursquare checkins. Other use cases for this type of social data could tell people when to visit high-traffic destinations such as the DMV. I love knowing when not to be somewhere as much as knowing what locations and parties are trending.

HealthMaps uses geosocial data and news reports to help track epidemics as they pop up. The mapping system was created by a team of researchers, epidemiologists and software developers from Children’s Hospital Boulder to monitor real-time epidemics as they break out. Rumi Chunara, worked on this project and also helped use geosocial data to track how cholera spread in Haiti. (Rumi will be speaking at Gnip’s social data conference, Big Boulder, about social data in public service.) Geosocial data has unlimited uses in the cases of health epidemics and natural disasters.

Companies are starting to create passive geolocation checkins such as EpicMix from Vail Resorts, which enables skiers to automatically check in using the RFID tags on their ski lifts. The system tells users how much they skied, where they skied, their vertical ascents and where their friends are on the mountain. During the last Coachella, 30,000 concertgoers used RFID bands from Intellix to checkin and update their Facebook status on various portals spaced throughout concert grounds. Near field communication is another way social data provides amazing patterns.

Geosocial data allows us insight into the patterns of everyday people, and the applications for this are endless.