Putting the Data in Data Discovery – Qliktech & Gnip Partner Up

Gnip is excited to announce that Qliktech is the newest member of our Plugged In partner program. While we partner with many different types of companies – ranging from innovative social analytics products to well-known big data services and software providers – Qliktech is a unique and exciting addition to our program.
Qliktech is discovery software that combines key features of data analysis with intuitive decision-making features, including (to name a few):

  • The ability to consolidate data from multiple sources
  • An easy search function across all datasets and visualizations
  • State-of-the-art graphics for visualization and data discovery
  • Support for social decision-making through secure, real-time collaboration
  • Mobile device data capture and analysis

Our partnership means that joint Qliktech and Gnip clients can easily marry social data with internal datasets to create nuanced visualizations that surface performance indicators and real-time changes that can impact the decisions those clients are making.

To put the powerful capabilities of this new partnership to good use, Gnip will be co-sponsoring a partner hackathon on April 6th at Qonnections– the Qliktech Partner Summit.

Along with HP Vertica and Qliktech, we’ll enable partners to hack on behalf of Medair, Swiss based humanitarian organization that provides support for health, nutrition, water and sanitation, hygiene, and shelter initiatives to countries experiencing natural disasters or emergencies.

A series of recent academic papers have highlighted the usefulness that social media plays in obtaining real-time information following sudden natural disasters. This hackathon will follow in those steps, using Twitter data from during Typhoon Haiyan, which landed in the Philippines on Nov 8th, 2013. Using Gnip’s Profile Geo enhancement, we’ll provide data from the Philippines during that period, allowing other Qliktech partners to experiment with how Medair could leverage this data, within Qliktech, in future situations that require real-time analysis and response.

It will be a great time, but more importantly, will harness the power of the Gnip and Qliktech relationship to accomplish something everyone can be proud of. And that’s a pretty good start to a new partnership!

Mapping Travel, Languages & Mobile OS Usage with Twitter Data

Some of the most compelling use cases we’ve seen for analyzing Twitter data involve geolocation. From NGO’s looking at geotagged Tweets to help deploy resources after disasters, to brands paying attention to where their fans are (or their disgruntled customers) to help drive engagement and marketing strategies, location adds key value to Tweet content.

We’ve been fascinated by these use cases and have wondered what else could be done with this data. A couple months ago our Data Science team set out to explore these questions, and to create some resources at the same time that would help others study and make use of geotagged Tweets. We brought in the team at MapBox – including data artist Eric Fischer – to help us dig into the data and visualize what we found in fast, fully navigable geotagged Twitter maps that would let us and our readers really explore this data in depth.

The interactive maps we created together build on other recent analyses and visualizations of Twitter data done by others, including this great post about details of the data and these static maps from Twitter’s Visual Insights team. The results are stunning, and we hope they’re helpful for you to make the data more practical and accessible as you evaluate what else you could be doing with geolocation in Twitter.

Locals and Tourists (Round 2)

Where do people tweet relative to where they live?

In 2010, Eric Fischer made a static map he called “Locals and Tourists” that showed geolocation for both Tweets and Flickr photos side by side, with the data color coded to show when a post was by a “local” (a post at or near the user’s stated home location) or a “tourist” (a post far from the user’s home location). Twitter has matured significantly since then, and we wanted to see what we could learn from looking at just the Twitter data today, with the ability to browse at any local level around the world. We gathered a sample of Twitter data with unique geotagged Tweet locations from the past ~18 months to generate this new interactive map.

As the dynamic maps took shape, the new version of “Locals and Tourists” impressed us in a couple ways. The first was simply how much resolution Twitter data provides. For instance, not only were primary and secondary roads clearly visible, but you can clearly see roads taken by tourists vs. roads used for local commutes, like this screenshot of I-95 snaking past Wilmington, DE and Philadelphia, PA in red across the bottom third of this image:

Twitter Visualizaiton

You can also clearly see the outlines of buildings like airports, sports stadiums, and major shopping malls that are frequented by tourists. Dig into your local area and see for yourself.

This map could be a resource for city planners, the travel industry, or for creative marketers thinking about how to localize their mobile advertising for different audiences.

Device Usage Patterns

This map shows off usage patterns for various mobile operating systems used to tweet around the world. Since geotagged Tweets require a Twitter client that includes GPS support, most geotagged Tweets come from handheld devices – and we can look at exactly which client was used in the “generator” metadata field provided by Twitter. Among other things, this visualization suggests correlations between mobile OS and income level in the US, and highlights just how prolific Blackberry use is in Southeast Asia, Indonesia and the Middle East.

Languages of the World

Using the same data sample, this final visualization plots where people tweeted in various languages, using metadata from the Gnip Language Detection Enrichment and the Chromium Compact Language Detector as a fallback.

For starters, this map makes clear that English is still the dominant language on Twitter around the world — toggling to the English-only view reveals nearly as much resolution in the global map as when all languages are enabled:

English Language Twitter Visualization of the US

 

English only 

 

Twitter Language Visualization

 

All languages

What might come as more of a surprise though is just how many other languages are being spoken frequently, and particularly how much overlap there is in the United States:

Twitter Visualization for Languages

 

Non-English Tweets across the US; Spanish in green

A Note on the Data

These maps are created with a data set that was significantly culled down to remove locations that would create visual noise. From the original data set, the following were removed:

  • Multiple geotagged Tweets in the exact same location (we made no attempt to communicate density in these visualizations)
  • Geotagged Tweets from the same user in very close proximity to other Tweets from the same user
  • Geotagged Tweets from known or detectable bots

Together these maps point to something powerful – by looking at geolocation data from Twitter in the aggregate, important understanding can be gained to drive marketing, product development, crisis response, or even inform research and policy decisions. In the coming weeks, we’ll be digging in deeper here on the blog to explore other important aspects of geolocation in social data that we hope together will build a picture of the opportunity that exists in understanding social data geospatially.

Find something compelling here or in any of the other maps? Tell us with a Tweet: @gnip.

Data Stories: Dino Citraro of Periscopic on Data Visualization

The Periscopic team has a long-standing reputation for their excellent work in data visualizations, so we asked on of the founders, Dino Citraro, to participate in a Data Story about data visualizations. You can follow Dino on Twitter at @dinocitraro and check out their work at Periscopic.com

Dino Citraro of Periscopic

1) Periscopic’s tagline is “Do good with data”. What are some of the projects that Periscopic that embody that tagline?

We formed Periscopic with the hope that we could do good with data. To us that means helping people that share the ideals of progressive social change, sustainability, human rights, equality, environmentalism, and transparency to name a few. Most of our work enables insights and discussions in those areas. Some recent and/or notable projects are:

“VoteEasy”

VoteEasy.org is a voter education tool that was designed to allow the general public to quickly and easily see how closely political candidates align with their views on key issues. It’s like Match.com for political candidates. It utilizes thousands of hours of research and a vast collection of data assembled by the nonpartisan group, Project Vote Smart. It is the most up-to-date resource for candidate political information, including voting records, interest groups ratings, campaign finances, and personal biography.

http://www.periscopic.com/#/work/voteeasy

“The State of the Polar Bear”

The State of the Polar Bear is the authoritative source for the health and status of the world’s polar bears. This multipart datavisualization was developed through an international partnership with the Polar Bear Specialist Group, a scientific collaboration of the five polar bear nations: Canada, Denmark, Norway, the USA, and Russia. It covers data related to pollution levels, tribal hunting, and population dynamics of the bears.

http://www.periscopic.com/#/work/pbsg

“Who’s Talking About Breast Cancer”

Developed for GE’s Healthymagination data visualization forum, this tool takes a realtime look at the discussions happening on Twitter around the topic of breast cancer. Tweets from all over the world are aggregated in a single location, allowing visitors to quickly understand the current topics, trends, and stories.

http://www.periscopic.com/#/work/ge-breast-cancer

2) With infographics now being an over-hyped tool for marketing, what challenges does that create for a company actually trying to tell stories with data?

If they are done well, infographics can be a very effective story-telling device. Unfortunately, many of them seem to either lack an engaging metaphor, or don’t do a good job of letting the data be the story.  Since most of our work is interactive, we have an advantage over traditional infographics because we can reveal information in a user-directed way. The challenges we face are how to slowly introduce these stories in a way that is engaging for visitors, and not overwhelming.

3) What are the greatest opportunities right now for data visualization?

The greatest opportunities for data visualization probably relate to public data and personal data. Public data, because it has that greatest potential for good and efficiency. Personal data, because it is the thing that most people seem to find interesting. The Quantified Self movement has exploded, and along with it the desire to understand our social media behaviors, and the rise of the Quantified Social Self.

4) How do you separate the wheat from the chaff when it comes to good data? 

There is no such thing as “good data”, there is only good context. You can create a compelling data visualization out of any data source, as long as you use the right context.  For instance, one of our pieces uses the gaps in the data – the lack of data – as part of the story. Our client wanted to highlight the fact that they needed to increase the data collection efforts, and wanted public support for this effort. You could have a massive data set that is impeccably organized, but without the right context, it can go unnoticed.

5) How does good visualization help create data literacy?

To us, the issue is literacy in general. Like good design, data visualizations should be transparent and unnoticed. The epiphanies one gets from interacting with data are the things that should be retained, not the fact that an interface was unique, or the interactivity was sophisticated.

Having said that, the very process of interacting with data through a visualization tool brings an understanding of what is possible, and with that, the desire increases for more, and better experiences.

Continue reading

Data Stories: Interview with Data Scientist Blake Shaw of Foursquare

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Blake Shaw of Foursquare about how data science is not only shaping Foursquare and its recommendations, but how Foursquare can be a “microscope for cities.” You can follow Blake on Twitter at @metablake and check out Foursquare’s blog for more data science. 

Data Scientist Blake Shaw of Foursquare

1. Your team has found a correlation between warm days and ice cream consumption in NYC. At some point, do you envision Foursquare being able to trigger offers based on different correlations your data science has found?

Yes!  In fact, we currently trigger recommendations (which often contain deals and offers) based on a ton of different contextual signals that the team here has identified as useful.  These signals include where you are, the places you like to go, the time of the day, the preferences of your friends, and what is popular around you. Mapping all of these signals to good recommendations requires finding correlations in massive amounts of data.  Some of these correlations are simple like when it’s the morning people like to get coffee, and some correlations are more complex like when it’s cold out in New York, people are more likely to go to ramen and noodle shops.

2. One of my favorite features of the Explore feature is that Foursquare lets you know when you check into a city locations where both locals and out-of-towners like to go. How does data science and product work together to make recommendations such as these?

Tourist recommendations is definitely one of my favorite features of Explore as well. In general, there is a healthy mix of product-driven and data-driven development at Foursquare. We will often work together to brainstorm not only what would be best to build from a product perspective but also what data we should be investigating further. Tourist recommendations came from the data; we realized that it would be easy to identify places that had a statistically high proportion of tourists and surface them to Explore users who find themselves in unfamiliar areas.  The results are fantastic — it’s like having millions of people creating a travel guide, just by walking around a city and checking in.

3. Foursquare got its start in NYC. What are interesting observations you’ve seen on how people use Foursquare in smaller cities such as Boulder and Denver?

I feel like Foursquare is more of a necessity in big cities like New York, where new places are opening all the time and it’s hard to keep track of them all.  That said, we see strong usage in places like Boulder and Denver as well. As expected, users in smaller cities such as these are more interested in old favorites rather than exploring new places.

4. What signals does Foursquare use to recommend places to people?

I can’t reveal all of the signals we use to rank places, but we believe that place recommendation should be highly personalized, so we heavily weight signals about your tastes and the tastes of your friends.  We also think that from all of this data about where people are going we can discern which are the best places.  Imagine being able to ask everyone who has been to a restaurant if they would go back. We believe that by measuring signals about places such as loyalty, expertise, and sentiment we can tease out the best places. This is the idea behind our recently launched Foursquare ratings.  People are voting with their feet in the real world, not simply leaving a star or a like on a website.

5. Do you see a correlation between Foursquare sharing check-ins and badges on other social sites and increased usage of Foursquare? For example, if someone chooses to share a checkin on Twitter or Facebook, does that increase the likelihood of other people checking in?

Yes we do. Roughly a quarter of all check-ins are shared to wider audiences on Twitter and Facebook.  These in turn help spread awareness and adoption of Foursquare.

6. Foursquare recently showed a visualization of how check-ins in NYC were affected by hurricane Sandy. How else do you see check-in data being useful other than for powering your recommendation engine?

Visualization of Foursquare Checkins Before and After Hurricane Sandy

One of my favorite aspects of working at Foursquare is getting to study this data from a larger sociological perspective. We are capturing this amazing signal about what millions of people are doing in the real world at every moment of the day in cities all around the globe. We have seen that when we aggregate check-in patterns across many individuals, we can measure features of cities at a higher resolution than was ever possible before.  I think this data can act almost like a “microscope for cities.”  If you look at how the storm affected NYC, you can see how this incredibly powerful force disrupted the natural rhythm of the city. It’s striking how predictable these patterns are, and how precisely we can identify unusual events. For example, in this plot we see how check-ins at grocery stores went up more than 200% in the days before the storm.  I see this real-time pulse or “EKG” of a city being a valuable resource in the future for understanding cities, giving us a larger view of the collective movement patterns of millions of people.

Continue reading

Four Themes From the Visualized Conference

The first Visualized conference was held in mid-town Manhattan last week. Even with Sandy and a nor’easter, the conference went off with only a few minor hiccups. The idea behind Visualized is a TED-like objective of exploring the intersection of big data, story telling and design. It worked.

Throwing designers and techies together is one of my favorite forums because of what is common and what is different. On one hand, artists are increasingly skilled with technical tools, on the other these people are often coming at things from very different perspectives.

The advantages of mixing these people at Visualized go beyond simple idea sharing.  Each person specializes, leading to amazing expertise, skill, and focused perspective, but also leaving something out. It is not that everyone can learn to do everything, but rather, by sharing projects, methods and tools, we can learn what to ask and who to seek out for collaboration.  The advantages of this mix are that it is the most reliable way to produce projects that evoke emotion with story, design and data to engage and inform.

We were treated to amazing technical talent and creativity, evident in, for example, Cedric Kiefer’s generative dancer reproduction “unnamed soundsculpture.”  To creating the basic model his team started with song, a dancer and knitting together the 3D surface images from three Microsoft Kinect cameras. They re-generated the movie of the dance by simulating the individual particles captured in the imaging and the enhancing these to generate more particles under the influence of “gravity” and “wind” driven by the music.

unnamed soundsculpture from Daniel Franke on Vimeo.

Cedric and his team radically expand ideas of numeric visualization by capturing and building on organic physical data in complex and subtle ways, generating a whole, engrossing new experience from the familiar elements.

Four themes surfaced repeatedly in the ideas and presentations of the speakers:

Teams

Most of the projects were produced by teams made up of people with  a handful diverse skills and affinities. I heard descriptions of teams such as, “we have a designer (color, composition and proportion sense, works in Illustrator, photoshop, pen and paper…), a data scientist (data munging, machine learning, statistical analysis…), a data visualization artist (Javascript, D3 skills, web API mashup skills…) someone who is driven by narrative and story telling (journalist, marketing project lead…), a database guy, etc.”

Assembling and honing these teams of technical and artistic creatives is probably a rare skill in itself and the result is a powerful engine of exploration, creativity and communication.

Hilary Mason from Bit.ly summed up the second-level data scientist talent shortage clearly: “Every company I know is looking to hire a data scientist; every data scientist I know is looking to hire a data artist.”  As broad as data scientists skills are, many are recognizing the value of talented designers with the appropriate programming skills for crafting a clear, engaging message.

The New York Times teams (two different teams presented), the WNYC team of two, Bit.ly, and many others showed the power of teams creating together and bringing diverse talents to projects.

There were a couple of notable individual efforts. My favorite was Santiago Ortiz’s beautiful, complex and functional visualization and navigation of his personal Knowledgebase. His design elegantly uses the 7-set Venn diagram, and his deep insights into searching by category and time come together perfectly.

Journalism

Sniffing out the story is fundamental to projects that evoke emotion with story, design and data to engage and inform. Journalists can smell drama and conflict and ask lots of questions. They have a sense of where to dig deeper. They are able to stick to the thread of the story and a have a valuable work ethic around finding the details and tying up loose ends.

A large part of success of Shan Carter and his team in creating the New York Times paths to the White House win visualization come from their ability to return over and over the the basic idea of making a relevant, accurate and understandable visualization of the various likely outcomes each each of the battleground states.  This visualization went through 257 iterations being checked into their Github repository with a few evident cycles of creative expanding followed by refocusing on the story.

Data Mashups

API-mashup skills found there best examples in the news teams. WNYC’s accomplishments in creating data/visualization mashups to communicate evacuation zones, subway outages, flood zone information updated in real-time during the storms, and other embeddable web widgets was amazing.  While their designs didn’t have the polish of some of the “slower” work presented, they produced great, accurate and timely results in days and sometimes hours.

Design

“A visualization should clarify in ways that words cannot.” (Sven Ehrmann)

This summed up what I found awe-inspiring and satisfying in the design work. Since I primarily work with data visualization, I often rely on the graph-reading skills of my audience rather than optimized design. This may be necessary for many business applications, but when the message is important and the investment you can reasonable expect from your audience is uneven, to take short cuts on design is to completely miss opportunities to engage and inform.  Great designers are masters at creating memory because the are able to reliably create “emotion linked to experience” (Ciel Hunter)

Jake Porway summed up data, team, story and design in the observations section of his presentation:

  • Data without analysis isn’t doing anything
  • Interdisciplinary teams are required
  • Visualization is a process (see the example from Shan Carter at NY Times above)
  • Tools enable amazing outcomes possible with limited resources
  • There is a lot of potential to do a great deal of good with when we learn to evoke emotion with story, design and data to engage and inform.