Data Story: Mohammad Shahangian on Pinterest Data Science

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Mohammad Shahangian of Pinterest about how the data science team works at Pinterest, surprising uses of Pinterest and data science as a career path. You can follow him on Pinterest at pinterest.com/mshahang

Data Scientist at Pinterest

1. What do you see is your role as the data scientist for Pinterest?

The company’s focus is on helping millions of people discover things they love and get inspiration to go do those things in their life. For me, that means analyzing the rich data that is created by the millions of people interacting with billions of pins from across the web each day. I evaluate this data and provide insights that make data actionable. My team also prototypes and validates ideas, performs deep analysis and builds tools that allow us to answer our most frequent questions in seconds. We work with every team to answer Pinterest’s biggest questions and ensure that each decision positively impacts Pinners over the long term.

For example, we take a business question like “How should our web, tablet and phone experiences differ?” and present the results as insights like, “Many users use the mobile apps in the morning and again at night, but prefer the website during the day” and “Users prefer to use mobile apps to casually discover new content, whereas they use the web to curate and organize content.” We then work with the design and product teams to build features around these insights and measure their impact.

2. What are some of your favorite ways that people use Pinterest that people wouldn’t expect?

What makes Pinterest unique is that it’s a tool and the users really define its use cases. For me, Pinterest was really helpful when I was planning my wedding and it made perfect sense to use as collaborative office shopping list. I would have never thought to use it as a tool for:

A collection of Stop signs from around the world
Daily Grommet gets their community to collaborate on a board to see things they want to sell
Vintage Driving - a collaborative board where users pin their favorite vintage cars:
GE Badass machines featuring GE tech
Madewell’s Rainbow board
Michelle Obama’s MyPlate Recipes encourages health eating
Stunning virtual collections of minerals and shipwrecks
The “365 Days of Pinterest” challenge. She made a Pinterest project every day for a year!
Sammy Sosa awesomeness
Sony shows off their technology with food pictures shot with a Sony Camera
Pantone announces the color of the year
The National Pork Board

3. What category do you see as the most viral on Pinterest?

DIY and recipes pins generally go viral year round. Around the holidays, holiday-themed content across all categories tends to get the most traction.

4. How has data science added value to Pinterest?

We have this internal value we refer to as “knit.” It means that we have an open, curious culture where everyone in different disciplines—from engineering and design to marketing to community—works together. Data science is at the core of that. The search, recommendations and spam teams apply data science to improve the quality of content we put in front of Pinners. This is only a subset of how we apply data though; most of the decisions we make at Pinterest are actually backed by data.

Data is a universal language that teams across the company use to collaborate and make decisions. Each team has a set of performance metrics, and we hold a weekly meeting to understand the impact that each area is having on company-wide metrics. As data scientists we do more than just analyze data, we create rich data sources that we make available to other teams so they can do their own analysis. More than half of Pinterest employees run MapReduce jobs via Hive.  Our metrics dashboards are accessible to everyone and our core metrics are emailed daily to the entire team.  We also share our data studies and insights with the whole team.

We also use data just for fun. During our weekly happy hour, we share a weekly Data Fun Fact with the team. We present the fact in the form of a multiple choice question and have the team vote on the answer. For example, we asked, “How many days before Valentine’s day does the query ‘Valentine’s day ideas’ increase the most: 1, 3, 5 or 7 days?” (Hint for the curious reader: two*three/two).

5. What do you think someone should know before becoming a data scientist at a major web company like Pinterest?

I would say go for it! If you are hungry to extract value from real world data, you’re really going to enjoy it. I know that for a lot of really talented people in academia the only thing standing between them and the opportunity to solve a really interesting problem is the lack of rich data. My experience at Pinterest has been the exact opposite. Our team can’t grow fast enough to tap into a world of valuable insights that are sitting dormant within billions of records somewhere in the cloud.

Continue reading

In The Future, The Data Scientist Will be Replaced by Tools


Some of you are celebrating. Some of you are muttering about how you could never be replaced by a machine.

What is the case for? What is the case against? How should we think about the investments in infrastructure, talent, education and tools that we hope will provide the competitive insights from “big data” everyone seems to be buzzing about?

First, you might ask why try to replace the data scientist with tools?  At least one reason is in the news: The looming talent gap.

WireUK reports,

Demand is already outstripping supply. A recent global survey from EMC found that 65 percent of data science professionals believe demand for data science talent will outpace supply over the next five years, while a report from last year by McKinsey identified the need in the US alone for at least 190,000 deep analytical data scientists in the coming years.”

Maybe we should turn to tools to replace some or all of what the data scientist does. Can you replace a data scientist with tools?  An emerging group of startups would like you to think this is already possible. For example, Metamarkets headlines their product page with “Data science as a service.” They go on to explain:

 Analyzing and understanding these data streams can increase revenue and improve user engagement, but only if you have the highly skilled data scientists necessary to turn data into useful information.

Metamarkets’ mission is to democratize data science by delivering powerful analytics that are easy and intuitive for everyone.

SriSatish Ambati of the early startup 0xdata (pronounced hex-data) goes a step further with the idea that “the scale of the underlying data and the complexity of running advanced analysis are details that need to be hidden.“ (GigaOm article)

On the other side of the coin, Cathy O’Neil at Mathbabe set out the case in her blog a few weeks ago that not only can you not replace the data scientist with tools, you shouldn’t even allow the non-data-scientist near the data scientist’s tools:

 As I see it, there are three problems with the democratization of algorithms:

 1. As described already, it lets people who can load data and press a button describe themselves as data scientists.

 2. It tempts companies to never hire anyone who actually knows how these things work, because they don’t see the point. This is a mistake, and could have dire consequences, both for the company and for the world, depending on how widely their crappy models get used.

 3. Businesses might think they have awesome data scientists when they don’t. [...] posers can be fantastically successful exactly because non-data scientists who hire data scientists in business, i.e. business people, don’t know how to test for real understanding.

If this is a topic that interests you, we’ve submitted a panel on this topic for SXSW this spring in Austin to discuss issues surrounding data science and tools. We will talk about what tools are available today, how they make us more effective as well as some of the pitfalls of tool use. And we will look into the future of tools to see where and if data scientists can be replaced by tools. Would love a vote!

Panelists:

  • John Myles White (@johnmyleswhite) – Coauthor of Machine learning for hackers and Ph.D. student in the Princeton Psychology Department, where he studies human decision-making.
  • Yael Garten (@yaelgarten) – Senior Data Scientist at LinkedIn.
  • James Dixon (@jamespentaho) – CTO at Pentaho, open source tools for business intelligence.

Update: One of our panelists, John Myles White, has provided some thoughtful analysis of companies that rely on automating or assisting data science tasks. See his blog post at http://www.johnmyleswhite.com/notebook/2012/08/28/will-data-scientists-be-replaced-by-tools

Data Stories: Interview with Hilary Mason of bitly

 Data Stories is Gnip’s opportunity to tell the cool stories about the data scientists, data journalists and other people who are working in data. This week we’re interviewing Hilary Mason, the chief data scientist of bitly.  She is currently helping organize DataGotham, a celebration of the New York’s data community happening Sept. 13 -14th. You can follow her on Twitter at @hmason and read her blog at HilaryMason.com

Hilary Mason of bitly

1) How did you get started in your role as a data scientist?
I’m a computer scientist and have always had a keen interest in both algorithms and databases. It became clear to me in the last decade that the most interesting algorithms were those that worked on real data. When I found that there were opportunities to design math and infrastructure to build new types of applications, I couldn’t resist!

2) bitly users share 80 million links a day. What are some of the coolest insights and trends you’ve been able to see from these shared links?

We see all kinds of fascinating things in the data. For example, people who read about physics also read about fashion (http://bit.ly/vSa6AO) and people who use kindles use them very differently than any other kind of device (http://bit.ly/wbRe6o). We’re always posting these things on our blog. For example, on July 4th we posted the most popular recipe by state for the holiday. Did you know that people in Florida enjoy Alligator Ribs (http://bit.ly/NwUEUL)?

3) bitly just updated its site making it even easier to share and curate links. As the chief data scientist, what excites you most about the new capabilities?

It’s wonderful to see bitly evolve from a utility into a truly social platform. We’re excited for bitly to become the central place for you to store, share, and analyze the things that you care about on the internet. We can then use the aggregate data that we collect to enhance that experience for you.

4) What are some of your favorite projects you’ve worked on while at bitly?

Our goal at bitly is to understand the internet’s attention, and to build systems that make that useful. It’s too hard just to pick one bit of it! I’m proud of some of the work that’s made it out into the world, like our post about the half life of links on various social networks (http://bit.ly/puUbzs) and our collaboration with Forbes on the interactive map of media influence (http://onforb.es/GFzphG). I’m also incredibly excited about a few product-oriented experiments that are going to be public shortly … stay tuned.

5) What tools are in your arsenal as a data scientist?

I’m a firm believer in finding the smartest people you can, and letting them use whatever works best. Personally, I’m a huge fan of the old skool unix utilities, and do more with grep and awk than I should probably admit.

Python is my current programming language of choice, though I’m not averse to C when necessary. A few people on my team have started to fall in love with Go, so that’s on my list to check out.

We use the best datastore for each challenge, and make heavy use of memcached, Redis, HDFS, and even text files.

In the non-tech world, I keep a moleskine notebook around and have fallen in love with the Hi-Tec-C .4mm pens from JetPens.

6) As the chief scientist, where do you think your team adds additional business value? How does data science help bitly make decisions it wouldn’t make otherwise?

My team plays a few roles within the company. We handle the business analytics, which can be answering very simple questions like, “How many new URLs did we see yesterday?” to complex questions like, “How do we value a URL being clicked from platform X vs platform Y over time?”.

We do research, pushing the boundaries of what we know to be possible with our data and systems. A few examples of these types of questions are, “Can we build a model of attention to any phrase people are actively clicking on?”, or “Can we predict opening weekend box office takes for movies that people are reading about via bitly links?”

Finally, we build products. Generally these are APIs, like the API that accepts a URL and returns the geographic distribution of attention to the URL, but sometimes they’re human-facing producs. More on that shortly.

In summary, my team is responsible for pushing the boundaries of where bitly can go. It’s fun.

Thanks to Hilary for taking the time to talk to us about her work with bitly! Let us know in the comments if you have a suggestion for another Data Stories 

Continue reading