Data Stories is Gnip’s opportunity to tell the cool stories about the data scientists, data journalists and other people who are working in data. This week we’re interviewing Hilary Mason, the chief data scientist of bitly. She is currently helping organize DataGotham, a celebration of the New York’s data community happening Sept. 13 -14th. You can follow her on Twitter at @hmason and read her blog at HilaryMason.com.
1) How did you get started in your role as a data scientist?
I’m a computer scientist and have always had a keen interest in both algorithms and databases. It became clear to me in the last decade that the most interesting algorithms were those that worked on real data. When I found that there were opportunities to design math and infrastructure to build new types of applications, I couldn’t resist!
2) bitly users share 80 million links a day. What are some of the coolest insights and trends you’ve been able to see from these shared links?
We see all kinds of fascinating things in the data. For example, people who read about physics also read about fashion (http://bit.ly/vSa6AO) and people who use kindles use them very differently than any other kind of device (http://bit.ly/wbRe6o). We’re always posting these things on our blog. For example, on July 4th we posted the most popular recipe by state for the holiday. Did you know that people in Florida enjoy Alligator Ribs (http://bit.ly/NwUEUL)?
3) bitly just updated its site making it even easier to share and curate links. As the chief data scientist, what excites you most about the new capabilities?
It’s wonderful to see bitly evolve from a utility into a truly social platform. We’re excited for bitly to become the central place for you to store, share, and analyze the things that you care about on the internet. We can then use the aggregate data that we collect to enhance that experience for you.
4) What are some of your favorite projects you’ve worked on while at bitly?
Our goal at bitly is to understand the internet’s attention, and to build systems that make that useful. It’s too hard just to pick one bit of it! I’m proud of some of the work that’s made it out into the world, like our post about the half life of links on various social networks (http://bit.ly/puUbzs) and our collaboration with Forbes on the interactive map of media influence (http://onforb.es/GFzphG). I’m also incredibly excited about a few product-oriented experiments that are going to be public shortly … stay tuned.
5) What tools are in your arsenal as a data scientist?
I’m a firm believer in finding the smartest people you can, and letting them use whatever works best. Personally, I’m a huge fan of the old skool unix utilities, and do more with grep and awk than I should probably admit.
Python is my current programming language of choice, though I’m not averse to C when necessary. A few people on my team have started to fall in love with Go, so that’s on my list to check out.
We use the best datastore for each challenge, and make heavy use of memcached, Redis, HDFS, and even text files.
In the non-tech world, I keep a moleskine notebook around and have fallen in love with the Hi-Tec-C .4mm pens from JetPens.
6) As the chief scientist, where do you think your team adds additional business value? How does data science help bitly make decisions it wouldn’t make otherwise?
My team plays a few roles within the company. We handle the business analytics, which can be answering very simple questions like, “How many new URLs did we see yesterday?” to complex questions like, “How do we value a URL being clicked from platform X vs platform Y over time?”.
We do research, pushing the boundaries of what we know to be possible with our data and systems. A few examples of these types of questions are, “Can we build a model of attention to any phrase people are actively clicking on?”, or “Can we predict opening weekend box office takes for movies that people are reading about via bitly links?”
Finally, we build products. Generally these are APIs, like the API that accepts a URL and returns the geographic distribution of attention to the URL, but sometimes they’re human-facing producs. More on that shortly.
In summary, my team is responsible for pushing the boundaries of where bitly can go. It’s fun.
Thanks to Hilary for taking the time to talk to us about her work with bitly! Let us know in the comments if you have a suggestion for another Data Stories
Data Stories Series:
- Liv Buli of Next Big Sound, the world’s first music data journalist
- Hilary Mason, Chief Data Scientist of bitly
- Rumi Chunara of HealthMap, using social data to identify epidemics
- Sherry Emery of UIC, studying social data and smoking cessation
- Lada Adamic of Michigan on information networks
- Annicka Campbell of SapientNitro on the Digital Love Project