Putting the Data in Data Discovery – Qliktech & Gnip Partner Up

Gnip is excited to announce that Qliktech is the newest member of our Plugged In partner program. While we partner with many different types of companies – ranging from innovative social analytics products to well-known big data services and software providers – Qliktech is a unique and exciting addition to our program.
Qliktech is discovery software that combines key features of data analysis with intuitive decision-making features, including (to name a few):

  • The ability to consolidate data from multiple sources
  • An easy search function across all datasets and visualizations
  • State-of-the-art graphics for visualization and data discovery
  • Support for social decision-making through secure, real-time collaboration
  • Mobile device data capture and analysis

Our partnership means that joint Qliktech and Gnip clients can easily marry social data with internal datasets to create nuanced visualizations that surface performance indicators and real-time changes that can impact the decisions those clients are making.

To put the powerful capabilities of this new partnership to good use, Gnip will be co-sponsoring a partner hackathon on April 6th at Qonnections– the Qliktech Partner Summit.

Along with HP Vertica and Qliktech, we’ll enable partners to hack on behalf of Medair, Swiss based humanitarian organization that provides support for health, nutrition, water and sanitation, hygiene, and shelter initiatives to countries experiencing natural disasters or emergencies.

A series of recent academic papers have highlighted the usefulness that social media plays in obtaining real-time information following sudden natural disasters. This hackathon will follow in those steps, using Twitter data from during Typhoon Haiyan, which landed in the Philippines on Nov 8th, 2013. Using Gnip’s Profile Geo enhancement, we’ll provide data from the Philippines during that period, allowing other Qliktech partners to experiment with how Medair could leverage this data, within Qliktech, in future situations that require real-time analysis and response.

It will be a great time, but more importantly, will harness the power of the Gnip and Qliktech relationship to accomplish something everyone can be proud of. And that’s a pretty good start to a new partnership!

From A to B: Visualizing Language for the Entire History of Twitter

It all started with a simple question: “How could we show the growth and change in languages on Twitter?”

Easy, right?

Well, several months later, here we are; finally ready to show off our final product. You can see a static image of the final viz below and check out the full story and interactive version in The Evolution of Languages on Twitter.

Looking back on the process that led us here, I realized that we’d been through an huge range of ideas and wanted to share that experience with others.

Where Did We Get the Data?

As a data scientist, I walk into Gnip’s vast data playground excited to analyze, visualize and tell stories. For this project, I had access to the full archive of public Tweets that’s part of Gnip’s product offering – that’s every Tweet since the beginning of Twitter in March of 2006.

The next question is: “With this data set, what’s the best way to analyze language?” We had two options here – use Gnip’s language detection or use the language field that’s in every Twitter user’s account settings. Gnip’s language detection enrichment looks at the text of every Tweet and classifies the Tweet as one of 24 different languages. It’s a great enrichment, but for historical data it’s only available back to March 2012.

Since we wanted to tell the story back to the beginning of Twitter, we decided to use the language field that’s in every Twitter user’s account settings.

Twitter_Account_Screenshot

This field has been part of the Twitter account setup since the beginning, giving us the coverage we need to tell our story.

The First Cut

Having defined how we would determine language, we created our first visualization.

streamgraph_2013-08-15_Volume_JeffsEdit

 

Interesting, but it doesn’t really tell the story we’re looking for.  This visualization tells the story of the growth of Twitter – it grew a lot. The challenge is that this growth obscures the presence of anything other than English, Japanese and Spanish. The sharp rise in volume also makes languages prior to 2010 impossible to see.

So we experimented with rank, language subsets, and other visualization techniques that could tell a broader story. At times, we dabbled in fugly.

Round Two

Moving through insights and iterations, we started to see each Twitter language become its own story. We chose relative rank as an important element and the streams grew into individual banners waving from year end marker poles like flags in the wind.

1yr_bump

With this version, we felt like we were getting somewhere…

The Final Version

To get to the final version, we reintroduced the line width as a meaningful element to indicate the percent of Tweet volume, pared down the number of languages to focus the story, and used D3 to spiff up the presentation layer. The end result is a simple visualization that tells the story of how language has grown and changed on Twitter. 

What became clear to me in this process is that visualization is a hugely iterative process and there’s not a single thing that leads to a successful end result. It’s a combination of the questions you ask, how you structure the data, the choices you make in what to show and what not to show and finally the tools you use to display the result.

Let me know what you think…