Data Stories: Interview with Data Scientist Blake Shaw of Foursquare

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Blake Shaw of Foursquare about how data science is not only shaping Foursquare and its recommendations, but how Foursquare can be a “microscope for cities.” You can follow Blake on Twitter at @metablake and check out Foursquare’s blog for more data science. 

Data Scientist Blake Shaw of Foursquare

1. Your team has found a correlation between warm days and ice cream consumption in NYC. At some point, do you envision Foursquare being able to trigger offers based on different correlations your data science has found?

Yes!  In fact, we currently trigger recommendations (which often contain deals and offers) based on a ton of different contextual signals that the team here has identified as useful.  These signals include where you are, the places you like to go, the time of the day, the preferences of your friends, and what is popular around you. Mapping all of these signals to good recommendations requires finding correlations in massive amounts of data.  Some of these correlations are simple like when it’s the morning people like to get coffee, and some correlations are more complex like when it’s cold out in New York, people are more likely to go to ramen and noodle shops.

2. One of my favorite features of the Explore feature is that Foursquare lets you know when you check into a city locations where both locals and out-of-towners like to go. How does data science and product work together to make recommendations such as these?

Tourist recommendations is definitely one of my favorite features of Explore as well. In general, there is a healthy mix of product-driven and data-driven development at Foursquare. We will often work together to brainstorm not only what would be best to build from a product perspective but also what data we should be investigating further. Tourist recommendations came from the data; we realized that it would be easy to identify places that had a statistically high proportion of tourists and surface them to Explore users who find themselves in unfamiliar areas.  The results are fantastic — it’s like having millions of people creating a travel guide, just by walking around a city and checking in.

3. Foursquare got its start in NYC. What are interesting observations you’ve seen on how people use Foursquare in smaller cities such as Boulder and Denver?

I feel like Foursquare is more of a necessity in big cities like New York, where new places are opening all the time and it’s hard to keep track of them all.  That said, we see strong usage in places like Boulder and Denver as well. As expected, users in smaller cities such as these are more interested in old favorites rather than exploring new places.

4. What signals does Foursquare use to recommend places to people?

I can’t reveal all of the signals we use to rank places, but we believe that place recommendation should be highly personalized, so we heavily weight signals about your tastes and the tastes of your friends.  We also think that from all of this data about where people are going we can discern which are the best places.  Imagine being able to ask everyone who has been to a restaurant if they would go back. We believe that by measuring signals about places such as loyalty, expertise, and sentiment we can tease out the best places. This is the idea behind our recently launched Foursquare ratings.  People are voting with their feet in the real world, not simply leaving a star or a like on a website.

5. Do you see a correlation between Foursquare sharing check-ins and badges on other social sites and increased usage of Foursquare? For example, if someone chooses to share a checkin on Twitter or Facebook, does that increase the likelihood of other people checking in?

Yes we do. Roughly a quarter of all check-ins are shared to wider audiences on Twitter and Facebook.  These in turn help spread awareness and adoption of Foursquare.

6. Foursquare recently showed a visualization of how check-ins in NYC were affected by hurricane Sandy. How else do you see check-in data being useful other than for powering your recommendation engine?

Visualization of Foursquare Checkins Before and After Hurricane Sandy

One of my favorite aspects of working at Foursquare is getting to study this data from a larger sociological perspective. We are capturing this amazing signal about what millions of people are doing in the real world at every moment of the day in cities all around the globe. We have seen that when we aggregate check-in patterns across many individuals, we can measure features of cities at a higher resolution than was ever possible before.  I think this data can act almost like a “microscope for cities.”  If you look at how the storm affected NYC, you can see how this incredibly powerful force disrupted the natural rhythm of the city. It’s striking how predictable these patterns are, and how precisely we can identify unusual events. For example, in this plot we see how check-ins at grocery stores went up more than 200% in the days before the storm.  I see this real-time pulse or “EKG” of a city being a valuable resource in the future for understanding cities, giving us a larger view of the collective movement patterns of millions of people.

Continue reading

Geosocial Data: Patterns of Everyday Life

My love for checking in and thus, geolocation, began after SXSW of 2009 while I racked up points and worked hard to become the leader of Boulder, ultimately losing to Eric Wu. Since then, my views on geolocation have evolved, and I have become especially enamored with the way geosocial data allows us to leave trails of the lives we and others are living. At its best, geolocation + social connects us to friends we are close to by letting us know who is near and collectively, social data can identify common interests and patterns of behavior we couldn’t see in the past.

Since 2008, Foursquare has evolved becoming a service with 50 million users and two billion check-ins and a facelift launching tomorrow, Twitter has opened up a geolocation API, Facebook Places launched and continues to evolve, Highlight launched and Gowalla was acquired by Facebook. All of these advancements have happened in a couple of short years. Geotagging allows these new crop of social networks to add your geographic location via metadata and now you can add location to tweets, photos, videos, etc.

Patterns of My Life

Every time I check in and share my location, I start leaving a trail of my day-to-day life. This trail, at its most basic, serves as a virtual diary of where I went and with whom. Timehop emails me each day to tell me what I did a year ago, while services such as Rewind.Me allow me to search my patterns and how I stack up against others.

Tripmeter lets me see my virtual trail and the how I travel throughout the day based on Foursquare and Facebook checkins, similar to what Route does. Where Do You Go even lets you heatmap where you most often visit (hint: I hate South Boulder).

Foursquare Heat Map

Checkins Are a Moving Census

But collectively, the patterns woven by geosocial data are incredibly telling and act as a living census. Intriguingly, researchers from Carnegie Mellon have created what they call “Livehoods” which are neighborhoods defined on not only on geographic proximity, but also based on social geotagged data. Essentially, the similarities are based on where people check in. While the data only includes those using geolocation, it shows that people who check into a local restaurant and a similar bar create cultural neighborhoods. This data is more than just an intellectual curiosity. Companies can analyze customer patterns to focus marketing efforts, identify companies to partner with and determine new brick-and-mortar locations.

Example of Livehood Data

I particularly love the idea of an app using Foursquare data called “When Should I Visit?” that tells you when is a good time to visit London tourist attractions based on Foursquare checkins. Other use cases for this type of social data could tell people when to visit high-traffic destinations such as the DMV. I love knowing when not to be somewhere as much as knowing what locations and parties are trending.

HealthMaps uses geosocial data and news reports to help track epidemics as they pop up. The mapping system was created by a team of researchers, epidemiologists and software developers from Children’s Hospital Boulder to monitor real-time epidemics as they break out. Rumi Chunara, worked on this project and also helped use geosocial data to track how cholera spread in Haiti. (Rumi will be speaking at Gnip’s social data conference, Big Boulder, about social data in public service.) Geosocial data has unlimited uses in the cases of health epidemics and natural disasters.

Companies are starting to create passive geolocation checkins such as EpicMix from Vail Resorts, which enables skiers to automatically check in using the RFID tags on their ski lifts. The system tells users how much they skied, where they skied, their vertical ascents and where their friends are on the mountain. During the last Coachella, 30,000 concertgoers used RFID bands from Intellix to checkin and update their Facebook status on various portals spaced throughout concert grounds. Near field communication is another way social data provides amazing patterns.

Geosocial data allows us insight into the patterns of everyday people, and the applications for this are endless.