Data science is a new profession and thus, there isn’t a clear educational or career path for data scientists. One of our most frequent questions we ask in our Data Stories series is asking about the career path people took to become data scientists. With Gnip’s own data science team, three of our members have PhDs in Physics and one has a masters in mathematics. So I am definitely interested in how universities are creating their own data science programs. To that end, I wanted to interview Annalee Saxenian, the dean of UC Berkeley’s School of Information, about their masters program for data scientists.
This is part of our Data Stories series leading to SXSW. Dean Saxenian is speaking on “The Future Belongs to Data Scientists.” Gnip is hosting a SXSW event for those involved in social data, email firstname.lastname@example.org for an invite.
1. Why create a masters program specifically for data scientists?
There is huge demand for people who can work with data, at large as well as small scales, using the new tools and technologies that are becoming available for data storage, analytics, and visualization. While data science has been pioneered by technology companies like Google, LinkedIn, and Facebook, we believe that every organization today (large and small, profit and non-profit, in every industry) has new sources of data that it can use to inform decision making and to develop new products and services. This new data, which comes from click streams, online sales receipts, sensor networks, mobile devices, and social media, is not only available at very large scales, but is also largely unstructured or semi structured. This makes the analysis of the data fundamentally different from analysis of the smaller and more structured data sets of the past.
2. Data scientist is a new career path. What are the advantages of receiving formal training through a program such as Berkeley’s School of Information versus real-world experience?
Most organizations don’t have the resources or the commitment to systematically expose employees to the range of new tools, technologies, and skills required of a data scientist. Even leading technology companies only provide very limited on-the job training in the relevant skills to their employees. They are looking for employees who already have expertise in areas like statistics and data analytics.
One of the advantages of a Master’s degree program like the Master of Information and Data Science (MIDS) at Berkeley is that our faculty has built a complete curriculum from the ground up–designing both the individual courses we think are essential to practicing data scientists as well as building the dependencies between the courses so that the whole is greater than the sum of the individual parts. The curriculum covers the full life cycle of data science. We offer courses devoted to research design, data storage and retrieval, statistical analysis, machine learning, data visualization and communication, data privacy and ethics, field experiments, and scaling and parallelism. In addition, we require that students gain experience working in teams. In short, formal education like the MIDS program offers comprehensive exposure to the field of data science.
3. Why did Berkeley decide to make the I School an online program?
The School of Information faculty decided to offer the program online for several reasons. On one hand, we are growing our existing programs and are outgrowing our facilities on the Berkeley campus. Offering an online program relieves us of the need to compete for scarce space on campus. We also believe that, as a School of Information, we should to be experimenting with new educational technologies, and that since most of our graduates will be working in teams and online settings, we should play a leadership role in this space. Last, but not least, by offering the degree online we are able to reach a much wider range of students than we can with our face to face programs. We are providing access to a Berkeley quality degree to people who who can’t move to Berkeley for family or work reasons and to those who need to continue working while they seek further education.
4. What characteristics do you think makes for the most successful data scientists?
Data scientists do need a set of technical and analytical skills and mastery of certain tools and technologies, but just as important are the soft skills. The most successful data scientists can think creatively about trends in data, collaborate well on teams, and communicate the findings from data to non specialists. So they need to be clear thinkers, good collaborators and communicators, and they need to be able to think creatively about what they see in the data.
5. What do you think are the upsides and downsides for companies for dealing with data that previously wasn’t accessible?
The upsides for companies: the new data can be used to enhance business decision making as well as to develop new products and services. Companies are using previously inaccessible data to learn more about customer behavior and about market trends. They are designing regular online experiments that allow them to generate data allowing them to learn real time about trade offs in design and other business decisions.
The downsides: most companies still don’t have people with the relevant skills to learn from the new data, and they will need to reorganize in order to take full advantage of the new data. The established silos in established companies mean that data is managed by a different group than those who are able to analyze it or who are developing products and they in turn are not well connected to senior decision makers. Taking full advantage of the new data will require much closer interaction between these different parts of the organization.