In The Future, The Data Scientist Will be Replaced by Tools


Some of you are celebrating. Some of you are muttering about how you could never be replaced by a machine.

What is the case for? What is the case against? How should we think about the investments in infrastructure, talent, education and tools that we hope will provide the competitive insights from “big data” everyone seems to be buzzing about?

First, you might ask why try to replace the data scientist with tools?  At least one reason is in the news: The looming talent gap.

WireUK reports,

Demand is already outstripping supply. A recent global survey from EMC found that 65 percent of data science professionals believe demand for data science talent will outpace supply over the next five years, while a report from last year by McKinsey identified the need in the US alone for at least 190,000 deep analytical data scientists in the coming years.”

Maybe we should turn to tools to replace some or all of what the data scientist does. Can you replace a data scientist with tools?  An emerging group of startups would like you to think this is already possible. For example, Metamarkets headlines their product page with “Data science as a service.” They go on to explain:

 Analyzing and understanding these data streams can increase revenue and improve user engagement, but only if you have the highly skilled data scientists necessary to turn data into useful information.

Metamarkets’ mission is to democratize data science by delivering powerful analytics that are easy and intuitive for everyone.

SriSatish Ambati of the early startup 0xdata (pronounced hex-data) goes a step further with the idea that “the scale of the underlying data and the complexity of running advanced analysis are details that need to be hidden.“ (GigaOm article)

On the other side of the coin, Cathy O’Neil at Mathbabe set out the case in her blog a few weeks ago that not only can you not replace the data scientist with tools, you shouldn’t even allow the non-data-scientist near the data scientist’s tools:

 As I see it, there are three problems with the democratization of algorithms:

 1. As described already, it lets people who can load data and press a button describe themselves as data scientists.

 2. It tempts companies to never hire anyone who actually knows how these things work, because they don’t see the point. This is a mistake, and could have dire consequences, both for the company and for the world, depending on how widely their crappy models get used.

 3. Businesses might think they have awesome data scientists when they don’t. [...] posers can be fantastically successful exactly because non-data scientists who hire data scientists in business, i.e. business people, don’t know how to test for real understanding.

If this is a topic that interests you, we’ve submitted a panel on this topic for SXSW this spring in Austin to discuss issues surrounding data science and tools. We will talk about what tools are available today, how they make us more effective as well as some of the pitfalls of tool use. And we will look into the future of tools to see where and if data scientists can be replaced by tools. Would love a vote!

Panelists:

  • John Myles White (@johnmyleswhite) – Coauthor of Machine learning for hackers and Ph.D. student in the Princeton Psychology Department, where he studies human decision-making.
  • Yael Garten (@yaelgarten) – Senior Data Scientist at LinkedIn.
  • James Dixon (@jamespentaho) – CTO at Pentaho, open source tools for business intelligence.

Update: One of our panelists, John Myles White, has provided some thoughtful analysis of companies that rely on automating or assisting data science tasks. See his blog post at http://www.johnmyleswhite.com/notebook/2012/08/28/will-data-scientists-be-replaced-by-tools