Author: Elaine Ellis, Marketing

Elaine Ellis is a Marketing Manager at Gnip where she runs social media and public relations. Previously, she was a marketing manager at Trada. Elaine received her BA in Marketing from Notre Dame (Go Irish!) and finds it incredibly awkward that her boss went to USC.

Data Stories: Harper Reed, Former CTO of Obama for America

Data Stories is Gnip’s project to tell the stories behind how people use data and why it matters. This week we interviewed Harper Reed, the Former CTO of Obama for America about the technology behind the scenes of elections, civic data and more. You can follow Harper on Twitter at @Harper.

Harper Reed Interview

Harper Reed, Photo by Joi Ito

 

1. In the next four years there will be massive changes in technology, what do you think will change in how campaigns use data in the 2016 election?

The big innovation this cycle was the analytics and how we found answers from the data we had. This follows the arc of the big data movement. When I first got involved in 2007/8 the conversations were all about collection and storage of data. Recently we have seen a shift from people not worrying about that because it is largely solved. Now, the big data space seems to have, thankfully, shifted to concentrate on gaining insight and getting answers from the data.

I think that this arc will continue. 2016 will be more about the answers that we will get from the data. Aggressively using modeling and data analytics to help make sure that there are no missteps.

2. Obama 2012 worked hard to remove the silos between tech and digital, what are some of the lessons you’ve learned between sharing data between departments?

This is obviously a work in progress for every organization. We made sure that there was a close physical proximity. That helps a lot.

Personally, the best lesson I learned was taught to me by John Maeda. He came in for a whirlwind visit and said “Manage by your Outbox. Not by your Inbox.” He then left. It was amazing and exactly the right amount of info. Later, he told me that this knowledge came from Larry Bacow.

Anyway, the idea is that you can work through political struggles, silos, etc by making sure that you are communicating out. Don’t expect or judge by the incoming communication.

This, for me, was the number one way to break down silos.

3. Political campaigns previously weren’t known for being full of tech savvy people. How do you reconcile the needs of campaign strategists and translate it to your team? Essentially, what did you learn about product development for campaigns?

Campaigns are organized like emergency response. Very top down. Lots of volunteers. Lots of downward delegation. Lots of managing and communicating up.

This type of organization does not mix well with standard software methodologies. It is hard to have a product manager when the stakeholders are unwilling to cede responsibility for the project to the PM.

Part of this is a lack of trust. Technology does not have a good history with campaigns (hopefully we helped instill more trust). Part of this is that in a top down type of environment, you need to negotiate more.

We found success by iterating on our process as quickly as we iterated on our software. We also would not have been able to do this without the great product team we had (Carol Davidsen, Mari Huertas, David Osborne, Jason Kunesh). They took the iteration seriously and made sure that the product development was successful no matter what.

4. What role did data scientists play in the reelection campaign?

We had an amazing set of data scientists – led by Rayid Ghani. They played the role of every scientist – thinking of crazy awesome things to test, testing experiments, making some of the coolest and more important of our discoveries and driving me crazy.

Working with them was awesome.

5. As one of your hacking projects, you took data from Chicago Transit Authority’s bus tracker and made it public. What other information would you like cities to make public?

I am going to pull the data hippie card here and say: ALL THE DATA!

I try and lead a very transparent and free life. There is not a lot of data that I think should not be public. Obviously personal data is a bit different (I would really like to have all my financial data, etc be available). But if it was available for everyone – it wouldn’t have such a stigma around it.

More realistically – the more civic data that is available, the better and more informed civic decisions we can make.

6. What’s next for you?

I have a small team of amazing people who I am working with. We are focusing on business tools. It should be fun.

Continue reading

Gnip’s Highlight Reel for 2012

Happy Holidays From Gnip

As Gnip reflects upon 2012, it’s exciting to see what we’ve accomplished with the help of our many amazing partners and customers. It was an important year for us and the social data ecosystem at large. This year we have been in awe about how people are using social data in numerous fields and applications, and we know that social data is at the tip of the iceberg. While we’re proud of what 2012 meant for Gnip, we’re even more thrilled to think about what is ahead for both Gnip and social data in 2013.

Here were our company’s highlights in 2012!

January
Gnip launches WordPress.com, WordPress.org and Intense Debate Firehoses. Partners with Automattic to make the full firehoses of social data from WordPress and Intense Debate available for the first time ever.

February
Gnip offers enhanced filtering for PowerTrack for Twitter.

Gnip launches 30-Day Replay for Twitter – Gnip launches the first historical Twitter product making the past 30 days of Twitter social data available for the first time ever.

Gnip makes the Disqus firehose available for the first time, making rich comment data available from the largest commenting platform.

March
Gnip creates new Twitter filtering operators to receive only Retweets or to receive only a sampling of Tweets matching a rule set.

April
Gnip launches the full firehose from Tumblr, one of the fastest growing social networks in the world.

May
Gnip offers the ability to filter Twitter by bios making it even easier for companies to find the data from a targeted audience.

June
Big data? Gnip’s got it. We start delivering 100 billion social data activities each month to our clients.

Gnip hosts Big Boulder, the world’s first conference on social data. Two days of 16 sessions with 32 speakers coming together to talk about the social data ecosystem.

In one of the most vibrant tech communities in the US, Gnip is named a Colorado Company to Watch.

July
Gnip launches data stories, our interviews with the people changing the world through social data.

We moved into a new office, doubling our office space to accomodate our growing staff.

August
Gnip has seen incredible growth from the financial markets incorporating social data into their prediction models. Gnip’s Seth McGuire visits SquawkBox to talk social data and the stock market.

Twitter announces it’s official Twitter Certified Products and names Gnip a Certified Data Reseller.

Continuing to make it easier to ensure you only get the Twitter data you want to receive, Gnip offers new Twitter Filtering Options offering new geo operators and user operators.

Gnip states what we believe – “We believe social data has unlimited value and near limitless application. We begin almost every conversation with potential customers and partners with this simple statement because it best exemplifies why we exist as a company. More importantly, this refrain helps keep us focused on our ultimate goal: to be the source of record for all public social conversation.”

September
Gnip launches Historical PowerTrack for Twitter making the full Twitter archive available for the first time in history. We believe this is the largest record of human behavior in history, and we think the possibilities are endless.

October
Being in the business more than four years and delivering 100 billion social data activities a month, we have some tricks up our sleeves. We created Gnip’s Engineering blog to share some of the lessons we’re learning.

Gnip makes monitoring YouTube comments even easier by launching a YouTube Comments API.

Gnip customer Union Metrics becomes the first recognized analytics provider by Tumblr.

November
Gnip announces a partnership with Hottolink, Japan’s largest social media monitoring firm.

Tumblr is the 10th largest site in the world, and continues to see demand for Tumblr social data. To accommodate this, Gnip creates PowerTrack for Tumblr making it even easier to hone in on the content they want.

Gnip is named a best place to work by the Denver Business Journal and (we’re hiring).

December
Gnip launches Plugged In To Gnip, our partner program to recognize our partners providing comprehensive, reliable and sustainable social data. You can read what our partners are saying about their partnerships with Gnip.

Data Stories: Dmitrii Vlasov on Kaggle Contests

At Gnip, we’re big fans of what the team at Kaggle is doing and have a fun time keeping tabs on their contests. One contest that I loved was held by WordPress and GigaOm to see what posts were most likely to generate likes, and we interviewed Dmitrii Vlasov who came in second in the Splunk Innovation Prospect and sixth overall. For me, it was interesting to speak to an up and coming data scientist who isn’t well known yet. Follow him at @yablokoff.

Dmitrii Vlasov of the GigaOm WordPress contest

1. You were recognized for your work in the first Kaggle contest you ever entered. What attracted you to Kaggle, and specifically the WordPress competition?

I came to Kaggle accidentally as it always happens. I read some blog post about the Million Song Dataset Challenge provided by Last.fm and bunch of other organizations. The task was to predict which songs will be liked by users based on their existing listening history. This immediately made me feel excited because I’m an active Last.fm user and was reflecting about what connections between people can be established based on their music preferences. But the contest was coming to end and so I switched to WordPress GigaOm contest and got 6th place there. Well, it is always interesting to predict something you already use.

2. What is your background in data science?

Now I’m a senior CS student in Togliatty, Russia. Can’t say that I have a special background in Data Science – I had more than a year-long course of probability theory and math statistics in university, some self-learned skills about semantic analysis and have big love to Python as a tool for implementing ideas. Also, I’ve entered the Machine Learning course on Coursera.

3. You found that blog posts with 30 to 50 pictures were more likely to be popular. You also found that longer blog posts also attract more likes (80,000-90,000 characters). This struck my marketing team as really high and was contrary to your hypothesis that longer content might be less viral. Why do you think this is?

Well, my numbers show relative correlation between amount of photos, characters and videos and the amount of likes received. Big relative “folks love” on several prominent amount of photos means that there were not so many posts with such amount of photos but most of them were qualitative. Quick empirical analysis shows that these are special type of posts – “big photo posts”. They usually are photo report, photo collection or scrapbook. For such types of posts 10-15 photos are not enough but at the same time 10-15 photos seem too overloaded for normal post. The same can be said about big amount of text in post. Of course, the most “likeable” posts contain 1,000-3,000 characters, but posts with 80-90 thousands are winners in “heavyweight category”. These are big researches, novels, political contemplation. Analyse is quite simple but it shows that if you want to create media-rich or text-rich content it should be really media-text-rich. Or you may fall in a hollow of not suitableness.

4. What else would like to predict with social data if you got the chance?

Now I work on romantic and friend relationships that could be established based on people’s music preferences (it’s a privately held startup in alpha). This is a really interesting and deep area! Also, I’d like to work with some political data e.g. to predict reaction on one or another politician’s statement based on a user’s Twitter feed. Or to extract all “real” thesis of politician based on all of his public speeches.

Bad Data, the Right Data and Le Data

Gnip believes social data can change the world and our leadership team has been writing about data in O’Reilly, speaking at the Sentiment Symposium and at LeWeb. We wanted to share what they were talking about.

Bad Data by O'Reilly

Gnip CEO Jud Valeski wrote a chapter in the recently released O’Reilly handbook “Bad Data” by Ethan McCullum. Jud wrote the chapter called “Social Data: Erasable Ink?” about how the evolving social media landscape is challenging expectations about how people interact with social data and who owns it. Gnip is committed to providing terms-of-service compliant social data and this chapter talks about the expectations around social data and how the various players are managing them.

 

 

 

Our COO Chris Moody speaking at the Sentiment Symposium on “Building Sentiment Analysis on the Right Social Data”

Building Sentiment Analysis on the Right Social Data (Chris Moody, Gnip) from Seth Grimes on Vimeo.

Jud being interviewed by Robert Scoble at LeWeb

What Our Partners Are Saying About Plugged In To Gnip

Yesterday we launched Plugged In To Gnip, our partner program to recognize our partners providing comprehensive, reliable and sustainable social data. Our COO Chris Moody wrote about what Plugged In To Gnip means for us yesterday, but we wanted to share what our partners were saying about the program and what it means to them.

FirstRain is Proud to be Plugged In To GNIP!

And critical to the success of FirstTweets was, first, to have comprehensive, authorized and reliable access to the Twitter firehose—and for this FirstRain relies on Gnip. Gnip is the world’s largest and most trusted provider of social media data, and in addition to being a fantastic team of innovative social data ninjas, they’ve been terrific partners in our quest to transform the way corporate enterprises drive revenue through Web and Social customer intelligence and analytics beyond social monitoring.

Netbase and Gnip: Maintaining Streams of Quality Content

When I describe my job, I borrow from my childhood near Pittsburgh and explain that if NetBase were a steel mill, I’d be the guy making sure we have trains and barges bringing in high-quality iron ore and the other raw materials we need to make high-quality steel. Gnip, whose Plugged In program we are proud to have joined from its launch, operates one of the biggest, fastest “trains” delivering raw data to us. Out of the roughly 50 million social media postings we analyze and index each day, the majority come from Twitter and we rely on Gnip’s ability to operate a TGV-scale delivery platform. The TGV, if you don’t know that term, is France’s Tres Grande Vitesse, the fastest railway in the world. We also depend on Gnip for Disqus blog comments and much more. But they’re not just filling up buckets for us – Gnip adds value.

Although the TGV has a switching system, its scale is microscopic compared with the “switching” we need from our content suppliers. Although in many cases we tell our supplier “Give us everything you have,” we are able to rely on some of them, including Gnip, to filter the content to ensure that our customers receive 100 percent of the relevant tweets, posts and articles and as little spam and junk as possible. That makes them more a supplier of raw material; it makes them a partner. We also rely on them to put in place the technical and business processes to ensure that we are in compliance with licensing and other requirements created by the owners and distributors of the data they supply – and to help our customers do the same. Just keeping up with those processes and requirements is a challenge as the social media ecosystem evolves and matures.

Gnip Touts Big Data Partnerships

“Social data is a new and incredibly exciting dataset that our customers are beginning to leverage within IBM InfoSphere BigInsights and other IBM products,” said Bruce Weed, IBM big data program director, in a statement. “Via the Plugged In To Gnip business partner program, we make it incredibly easy for them to access that social data.”

Union Metrics is Proud to be Plugged In To Gnip

What else does this partnership mean? It means a number of things, but what’s most important to you – our valued customers – is that all our social analytics products are built on the highest quality, most comprehensive and reliable social data. We are committed to bringing you the data that you need to be successful with social media and our partnership with Gnip helps make that possible; full coverage, high-quality data is at the heart of all our analytics solutions.

Infochimps Plugged In To Gnip

Getting a handle on the immense volume of data produced by the social networks provided by Gnip often requires a sophisticated data infrastructure for the processing and control of feeds.  As a partner in providing solutions to customers needing to extract insight from this treasure trove of data, Infochimps can help by setting up customers with a best in class data platform for refining and working with Gnip’s feeds.

Joining Forces: BrandWatch Collaboration With Gnip 

As a Gnip partner, the social data we access through our Twitter Firehose integration is certified as the most comprehensive, reliable and sustainable source of social data. So in other words, as a Brandwatch client, this has always assured you that the insights you receive are based on the most complete access available for this data – period.

Clarabridge Partners with Gnip to Bring Real-Time Monitoring of Social Media Data to Customer Experience

We recognize that consumers are increasingly looking to social media as a critical means of communication, and businesses need to stay on top of consumer feedback in real time,” said Sid Banerjee, CEO, Clarabridge. “As a member of Gnip’s Plugged In program, our customers will be able to collect, analyze, operationalize and measure social media data in real time, as well as engage directly back with the customer. We are excited to partner with Gnip and enable Clarabridge customers to forge real and direct relationships with their consumers and elevate the customer experience they provide.

Greenplum Plugs In To Gnip

With data from social media platforms accounting for an ever-increasing amount of the Big Data deluge, collecting that data becomes an ever-moving target. As social networks proliferate, so do the respective services’ APIs and access policies. Social media aggregator Gnip aims to simplify the process, allowing businesses to focus on reaping social data insight, rather than tracking which services are hot, which are not, and which have changed their API policies.

Coinciding with its OpenChorus initiative, Greenplum announced a partnership with Gnip recently to make the company’s APIs accessible through the Greenplum platform. As Gnip details in its new Plugged In To Gnip campaign, users of Greenplum Chorus, UAP, Database, and HD can now easily access data from the Twitter, Tumblr, WordPress, StockTwits and Disqus firehoses.

uberVU is now Plugged In to Gnip

As an inaugural member of Gnip’s Plugged In program, this certifies that we are accessing social data, like Twitter, through the most comprehensive, reliable and sustainable source of social data. Now, our customers can look to this certification as an assurance that the insights they receive are based on the most complete access available for this data – period. Yes, ladies and gentlemen, we have the firehose.

UberVu Drinking from the Gnip Firehose

Data Stories: Dino Citraro of Periscopic on Data Visualization

The Periscopic team has a long-standing reputation for their excellent work in data visualizations, so we asked on of the founders, Dino Citraro, to participate in a Data Story about data visualizations. You can follow Dino on Twitter at @dinocitraro and check out their work at Periscopic.com

Dino Citraro of Periscopic

1) Periscopic’s tagline is “Do good with data”. What are some of the projects that Periscopic that embody that tagline?

We formed Periscopic with the hope that we could do good with data. To us that means helping people that share the ideals of progressive social change, sustainability, human rights, equality, environmentalism, and transparency to name a few. Most of our work enables insights and discussions in those areas. Some recent and/or notable projects are:

“VoteEasy”

VoteEasy.org is a voter education tool that was designed to allow the general public to quickly and easily see how closely political candidates align with their views on key issues. It’s like Match.com for political candidates. It utilizes thousands of hours of research and a vast collection of data assembled by the nonpartisan group, Project Vote Smart. It is the most up-to-date resource for candidate political information, including voting records, interest groups ratings, campaign finances, and personal biography.

http://www.periscopic.com/#/work/voteeasy

“The State of the Polar Bear”

The State of the Polar Bear is the authoritative source for the health and status of the world’s polar bears. This multipart datavisualization was developed through an international partnership with the Polar Bear Specialist Group, a scientific collaboration of the five polar bear nations: Canada, Denmark, Norway, the USA, and Russia. It covers data related to pollution levels, tribal hunting, and population dynamics of the bears.

http://www.periscopic.com/#/work/pbsg

“Who’s Talking About Breast Cancer”

Developed for GE’s Healthymagination data visualization forum, this tool takes a realtime look at the discussions happening on Twitter around the topic of breast cancer. Tweets from all over the world are aggregated in a single location, allowing visitors to quickly understand the current topics, trends, and stories.

http://www.periscopic.com/#/work/ge-breast-cancer

2) With infographics now being an over-hyped tool for marketing, what challenges does that create for a company actually trying to tell stories with data?

If they are done well, infographics can be a very effective story-telling device. Unfortunately, many of them seem to either lack an engaging metaphor, or don’t do a good job of letting the data be the story.  Since most of our work is interactive, we have an advantage over traditional infographics because we can reveal information in a user-directed way. The challenges we face are how to slowly introduce these stories in a way that is engaging for visitors, and not overwhelming.

3) What are the greatest opportunities right now for data visualization?

The greatest opportunities for data visualization probably relate to public data and personal data. Public data, because it has that greatest potential for good and efficiency. Personal data, because it is the thing that most people seem to find interesting. The Quantified Self movement has exploded, and along with it the desire to understand our social media behaviors, and the rise of the Quantified Social Self.

4) How do you separate the wheat from the chaff when it comes to good data? 

There is no such thing as “good data”, there is only good context. You can create a compelling data visualization out of any data source, as long as you use the right context.  For instance, one of our pieces uses the gaps in the data – the lack of data – as part of the story. Our client wanted to highlight the fact that they needed to increase the data collection efforts, and wanted public support for this effort. You could have a massive data set that is impeccably organized, but without the right context, it can go unnoticed.

5) How does good visualization help create data literacy?

To us, the issue is literacy in general. Like good design, data visualizations should be transparent and unnoticed. The epiphanies one gets from interacting with data are the things that should be retained, not the fact that an interface was unique, or the interactivity was sophisticated.

Having said that, the very process of interacting with data through a visualization tool brings an understanding of what is possible, and with that, the desire increases for more, and better experiences.

Continue reading

Data Stories: Interview with Data Scientist Blake Shaw of Foursquare

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Blake Shaw of Foursquare about how data science is not only shaping Foursquare and its recommendations, but how Foursquare can be a “microscope for cities.” You can follow Blake on Twitter at @metablake and check out Foursquare’s blog for more data science. 

Data Scientist Blake Shaw of Foursquare

1. Your team has found a correlation between warm days and ice cream consumption in NYC. At some point, do you envision Foursquare being able to trigger offers based on different correlations your data science has found?

Yes!  In fact, we currently trigger recommendations (which often contain deals and offers) based on a ton of different contextual signals that the team here has identified as useful.  These signals include where you are, the places you like to go, the time of the day, the preferences of your friends, and what is popular around you. Mapping all of these signals to good recommendations requires finding correlations in massive amounts of data.  Some of these correlations are simple like when it’s the morning people like to get coffee, and some correlations are more complex like when it’s cold out in New York, people are more likely to go to ramen and noodle shops.

2. One of my favorite features of the Explore feature is that Foursquare lets you know when you check into a city locations where both locals and out-of-towners like to go. How does data science and product work together to make recommendations such as these?

Tourist recommendations is definitely one of my favorite features of Explore as well. In general, there is a healthy mix of product-driven and data-driven development at Foursquare. We will often work together to brainstorm not only what would be best to build from a product perspective but also what data we should be investigating further. Tourist recommendations came from the data; we realized that it would be easy to identify places that had a statistically high proportion of tourists and surface them to Explore users who find themselves in unfamiliar areas.  The results are fantastic — it’s like having millions of people creating a travel guide, just by walking around a city and checking in.

3. Foursquare got its start in NYC. What are interesting observations you’ve seen on how people use Foursquare in smaller cities such as Boulder and Denver?

I feel like Foursquare is more of a necessity in big cities like New York, where new places are opening all the time and it’s hard to keep track of them all.  That said, we see strong usage in places like Boulder and Denver as well. As expected, users in smaller cities such as these are more interested in old favorites rather than exploring new places.

4. What signals does Foursquare use to recommend places to people?

I can’t reveal all of the signals we use to rank places, but we believe that place recommendation should be highly personalized, so we heavily weight signals about your tastes and the tastes of your friends.  We also think that from all of this data about where people are going we can discern which are the best places.  Imagine being able to ask everyone who has been to a restaurant if they would go back. We believe that by measuring signals about places such as loyalty, expertise, and sentiment we can tease out the best places. This is the idea behind our recently launched Foursquare ratings.  People are voting with their feet in the real world, not simply leaving a star or a like on a website.

5. Do you see a correlation between Foursquare sharing check-ins and badges on other social sites and increased usage of Foursquare? For example, if someone chooses to share a checkin on Twitter or Facebook, does that increase the likelihood of other people checking in?

Yes we do. Roughly a quarter of all check-ins are shared to wider audiences on Twitter and Facebook.  These in turn help spread awareness and adoption of Foursquare.

6. Foursquare recently showed a visualization of how check-ins in NYC were affected by hurricane Sandy. How else do you see check-in data being useful other than for powering your recommendation engine?

Visualization of Foursquare Checkins Before and After Hurricane Sandy

One of my favorite aspects of working at Foursquare is getting to study this data from a larger sociological perspective. We are capturing this amazing signal about what millions of people are doing in the real world at every moment of the day in cities all around the globe. We have seen that when we aggregate check-in patterns across many individuals, we can measure features of cities at a higher resolution than was ever possible before.  I think this data can act almost like a “microscope for cities.”  If you look at how the storm affected NYC, you can see how this incredibly powerful force disrupted the natural rhythm of the city. It’s striking how predictable these patterns are, and how precisely we can identify unusual events. For example, in this plot we see how check-ins at grocery stores went up more than 200% in the days before the storm.  I see this real-time pulse or “EKG” of a city being a valuable resource in the future for understanding cities, giving us a larger view of the collective movement patterns of millions of people.

Continue reading

SGI Launches Global Twitter Heartbeat, Powered by Gnip

File this under cool news.

SGI’s Big Brain Computer has created a Global Twitter Heartbeat, allowing the supercomputer to analyze the Twitter stream for sentiment and geolocation to create a Twitter heartbeat telling us how the world is feeling based on emotions communicated via Twitter. Not only is this a cool undertaking by the folks at SGI, but we’re proud to announce that it is powered by Gnip’s decahose Twitter stream.

To make this happen, SGI partnered with Kalev H. Leetaru of the University of Illinois and Dr. Shaowen Wang of the CyberInfrastructure and Geospatial Information (CIGI) Laboratory at the University of Illinois at Urbana-Champaign.

This isn’t just some simple stream.  The SGI supercomputer analyzes every Tweet to assign location (not just GPS-tagged tweets, but processing the text of the Tweet itself) and tone values, then visualizing the conversation in a heat map that puts Tweet location, Tweet density and tone into a unified geospatial perspective. The entire process from ingestion to data analysis to producing the heat map runs at a speed that allows visualization of a map frame per second.

To see it live, check out SGI’s Facebook page.

You can also see videos of the Twitter Heartbeat for the Presidential Elections and Hurricane Sandy.

Data Stories: Interview with Simon Rogers of the Guardian Data Blog

If you’re into data visualizations at all, then you’re going to be familiar with Simon Rogers of the Guardian data blog. They tell incredible stories using data and they’re a leader in the industry for data journalism. I was elated when Simon agreed to be interviewed for a Data Story to talk about his work in data journalism. 

Simon Rogers of Guardian Data Blog

1. How does your department find its data and choose which sources to use?                                  It really varies. Sometimes it’s breaking news, such as Hurricane Sandy. That led us to do this map showing every verified event as we felt the raw information was too difficult for most people to find. Sometimes there’s a dataset that’s been released that we feel really needs questioning and investigation. Other times it’s down to a hunch that it might be interesting. After the Denver shootings we did this post looking at gun ownership and homicide rates around the world; so it can be something as serious as that – or as weird as a list of Doctor Who villains (personal obsessions can come into the process…)

2. What do you find first? The stories you want to tell or data that can tell stories?
It’s all about the stories, so I would say it goes that way round. It’s good to start examining a datasource with an idea in mind of what you’re looking for. Otherwise the whole thing just gets too unmanageable.

3. You majored in journalism. How did you end up pursuing data journalism, and what skills did you need to learn along the way? 
After 9/11 (which was my second day on the newsdesk) I was told to work with the graphics team to help tell those stories. And I found myself coming back to that role in between editing the science section. During that process, I started getting better at working with spreadsheets and just collecting data, often just make my job easier. You don’t want to keep having to search for Carbon emissions data anew each time you’re doing a story on climate change. In around 2006 or 2007, Adrian Holovaty came and gave a talk at the Guardian to staff and I thought ‘that sounds like a job – and it’s not a long way away from what I’m doing already’. So, when we launched the Datablog in early 2009, it was just a matter of surfacing data we already had. In the meantime, I’ve learnt a load of tools, but mainly work with Excel, Google refine, Fusion tables and free viz tools like tableau and Datawrapper.

4. The Guardian has incredible visualizations. As a data journalist, how do you work with your graphic artists to tell a story?
We can’t each do everything. I can make a map, but it will be miles better if a designer does it and the Guardian has some great graphic designers and a brilliant graphic team. But what I can do is get the right information they need and get it in the right format, and really help with telling the story.

5. The Guardian is trying to be a repository for all open government data? What data do you wish was more readily available?
Basic spending data. It should be easy to find but just getting a total amount that each  UK government department spends is always a nightmare involving PDFs. It’s not good having ultra-granular spending figures if we can’t get the totals.

6. You recently told a story about how people are using homophobic language on Twitter (The No Homophobes guide to language on Twitter). What other stories would you like to tell around social data?
That was actually showcasing amazing work on the web – which we do a lot now. I’m fascinated by the way that people use social media and how they use it in conjunction with other media – ie people tweeting while they’re watching TV, for instance. The way that people use Twitter in a crisis is fascinating – and how we share images too.

Continue reading

Social Data and The Election

If you’re excitedly waiting the results of the election, and wanted to keep an eye what people are saying about Election 2012 on social media, we have a list of resources below:

  • I Voted Map – A realtime map by the good people at Foursquare allowing people to check in and say “I voted.” The map compiles checkins from voters.
  • The Twitter Political Index – Twitter’s official coverage measuring the sentiment between each Presidential candidate and trending topics related to the election.
  • Tumbling the Election – Coverage from Tumblr’s editorial team tracking top election related hashtags.
  • Facebook Stories on the Election – Watch Americans that said they voted on Facebook in real time.
  • Tumblr Election 2012 – Union Metrics visualization of trending election-related tags on Tumblr. Shows how many posts per second are about the election.
  • Infinigon Group – Tracking realtime political sentiment on social media.
  • 2012 Election Mood Meter – Netbase election mood meter tracks sentiment on social media for both candidates for President and Vice President candidates and breaks it down further by gender.
  • Electoral Map based on Tweets – The Guardian has created a map to show who would win the election based on Tweets.
  • Election Day on Twitter – Al Jazeera and Flowics show buzz volume on Twitter about each of the Presidential candidates as well as a live stream about Tweets on each candidate.
  • Rock The Vote Real-Time Politics - Splunk and MTV have created a visualization of hashtags on Twitter about each Presidential candidate.
  • The Crowdwire – Bluefin Labs has created what they’re calling a “Social Exit Poll” looking at how people are talking about how they voted on social media.
  • Yahoo! Election Control Room –  Attensity has teamed up with Yahoo! to show how America is feeling about the election and sharing select Tweets.
  • US Electoral Compass 2012 – Brandwatch allows you to select a state and date range to show what political issues each state is talking about.
  • NBC Politics – Using Crimson Hexagon to power their social media analysis.
  • Twitter Sentiment Analysis – USC is tracking sentiment around each candidate.

Who else are we missing? Also, as a bonus you can see our interview with Gabriel Banos of Zauber Labs on predicting the election with social data and Union Metrics comparison of the candidates using Tumblr data.

Social Data Around Voter Values