Data Story: Eric Colson of Stitch Fix

Data Stories is our blog series highlighting the cool and unusual ways people use data. I was intrigued by a presentation that Eric Colson gave to Strata about Stitch Fix, a personal shopping site, that relied heavily on Stitch Fix data along with its personal shoppers. This was a fun interview for us because several of my female colleagues order Stitch Fix boxes filled with items Stitch Fix thinks they might like. It’s amazing to see how data impacts even fashion. As a side note, this is Gnip’s 25th Data Story, so be on the watch for a compilation of all of our amazing stories. 

Eric Colson of Stitch Fix

1. Most people think of Stitch Fix as personal shopping service, powered by professional stylists. But, behind the scenes you are also using data and algorithms. Can you explain how this all works?

We use both machine processing and expert-human judgment in our styling algorithm.   Each resource plays a vital role. Our inventory is both diverse and vast. This is necessary to ensure we have relevant merchandise for each customer’s specific preferences.  However, it is so vast that it is simply not feasible for a human stylist to search through it all.  So, we use machine-learning algorithms to filter and rank-order all the inventory in the context of each customer. The results of this process are presented to the human stylist through a graphical interface that allows her to further refine the selections.  By focusing her on only the most relevant merchandise, the stylist can apply her expert judgment.   We’ve learned that, while machines are very fast at processing millions of data points of information, they still lack the prowess of the virtuoso. For example, machines often struggle with curating items around a unifying theme. In addition, machines are not capable at empathizing; they can’t detect when a customer has unarticulated preferences – say, a secret yearning to be pushed in a more edgy direction. In contrast, the human stylist are great at these things. Yet, they are far more costly and slower in their processing. So, the two resources are very complementary! The machines narrow down the vast inventory to a highly relevant and qualified subset so that the more thoughtful and discerning human stylist can effectively apply her expert judgment.

2. What do you think would need to change if you ever began offering a similar service for men?

We would likely need entirely new algorithms and different sets of data.  Men are less self-aware of how things should fit on them or what styles would look good on them (at least, I am!). Men also shop less frequently, but typically indulge in bigger hauls when they do. Also, the styles are less fluid for men and we tend to be more loyal to what is tried & true.  In fact, a feature to “send me the same stuff I got last time” might do really well with men. In contrast, our female customers would be sorely disappointed if we ever sent them the same thing twice!

So, while the major technology pieces of our platform are general enough to scale into different categories, we’d still want to collect new data and development different algorithms and features to accommodate Men.

3. How did you use your background at Netflix to help Stitch Fix become such a data driven company?
Data is in the DNA at Stitch Fix. Even before I joined (first, as an advisor and later as an employee), they had already built a platform to capture extremely rich data. Powerful distinctions that describe the merchandise are captured and persisted into structured data attributes through both expert human judgment as well as from transactional histories (e.g. How edgy is a piece of merchandise?, How well does it do with moms?, …etc).  This is a rare capability – one that even surpasses what Netflix had. And, the customer data at Stitch Fix is unprecedented! We are able to collect so much more information about preferences because our customers know its critical to our efforts to personalize for them. I only wish I had this type of data while at Netflix!

So, in some ways Stitch Fix already had edge over Netflix with respect to data. That said, the Netflix ethos for democratizing innovation has permeated into the Stitch Fix culture. Like Netflix, we try not to let our biases and opinions blind us as we try new ideas. Instead, we take our beliefs for how to improve the customer experience and reformulate them as hypotheses. We then run an AB test and let the data speak for itself. We either reject or accept the hypothesis based on the observed outcome. The process takes emotion and ego away and allows us to make better decisions.

Also, like Netflix, we invest heavily in our data and algorithms.  Both companies recognize the differentiating value in finding relevant things for their customers. In fact, given our business model, algorithms are even more crucial to Stitch Fix than they are to Netflix.  Yet, it was Netflix which pioneered the framework for establishing the capability as strategic differentiator.

4. How else is Stitch Fix driven by data?

Given our unique data, we are able to pioneer new techniques for most business processes. For example, take the process of sourcing and procuring our inventory. Since we have the capability of getting the right merchandise in front of the right customer, we can do more targeted purchasing. We don’t need to make sweeping generalization about our customer base. Instead, we can allow each customer to be unique. This allows us to buy more diverse inventory in smaller lots since we know we will be able to send it only to the customers for which it is relevant.

We also have the inherent ability to improve over time. With each shipment, we get valuable feedback. Our customers tell us what they liked and didn’t like. They give us feedback on the overall experience and on every item they receive. This allows us to better personalize to them for the next shipment and even allows us to apply the learnings to other customers.

5.  Your stylists will sometimes override machine-generated recommendations based on other information they have access to. For example, customers can put together a Pinterest board so that they can show the stylist things they like. Do you think machines will ever process this data?

No time soon! Processing unstructured data such as images and raw text are squarely in the purview of humans. Machines are notoriously challenged when it comes extracting the meaning that is conveyed in this type of information. For example, when a customer pins a picture to a Pinterest board, often they are expressing their fondness for a general concept, or even an aspiration, as opposed to the desire for a specific item. While machine learning has made great strides in processing unstructured data, there is still a long ways to go before they can be reliable.

Thanks to Eric for the interview! If you have suggestions for other Data Stories, please leave a comment! 

Continue reading

Data Story: Dan Lynn of Full Contact

Data stories is Gnip’s way to talk about the many amazing ways that data is used. Today on the blog we’re speaking with Dan Lynn, a cofounder and CTO of FullContact. FullContact is trying to solve the world’s contact information problem, which is no small feat. We thought the dilemmas faced by this team with dealing with disparate and decaying data makes for a great story. You can follow Dan on Twitter at @DanKLynn

Dan Lynn of FullContact

1. What problem is Full Contact trying to solve with data?

At FullContact, we’re solving the world’s contact information problem, which is that your contact information is a mess. In address books like GMail, Outlook, SalesForce and customer lists, you’ve got missing details, duplicate entries, and the same person fractured across multiple cloud systems. We’re using data to help you clean all that up and keep those address books in sync, up to date, and duplicate-free.

2. What do you see as the advantages of combining social data with contact information? Do people make deeper connections if they have social data?

When I was growing up, I had 3 or fewer ways I could contact my friends: street address, phone (usually their parents’!) and, later, email. As the Internet took off, they added instant messenger accounts, eBay usernames, Twitter handles, Facebook accounts, LinkedIn profiles and dozens more. These are all valid means of contacting someone, but most people prefer some over others, and it’s great to have that choice.

While it’s awesome for me to find out who among my contacts have Twitter accounts that I’m not yet following, using social data is very helpful for me (or a computer) to tell two similar-but-different contacts apart. Social profiles are starting to act more and more as a person’s public identifier, much like a Social Security number that you would actually *want* people to have. Filling-out my contacts with social data makes it that much easier to merge duplicates, tell the difference between John Smith Jr. and John Smith Sr., and contact people in ways other than email, phone, or snail mail.

3. What do you wish you knew a year ago about how people archive and share contact information?

Honestly, a year ago, the problem was staring us straight in the face: people *don’t* really archive and share contact information. Sharing has been too error prone for people to trust an automated system not to screw up their contacts. I’ve lost count of the number of times people share contact information by reading phone numbers from each other’s phone, yelling email addresses across the room, or emailing contact info back and forth with subject lines like “Bart Lorang’s phone number”. The problem is hard, and everyone has different expectations around the idea of sharing contact information. Many people want their contacts automatically kept up to date with changes in their co-workers’ address books. Others only want updates if the contact publicly changes his/her information. What should an automated system do if two of your colleagues share conflicting changes to one of your contacts? Ultimately we all just want the best way to get in touch with someone at a given time.

4. Contact information is considered decaying data. What are the challenges of working with decaying information?

The idea of decaying data is that the data you have *right* now is only a snapshot of the world at a given time. You could say that your data “decayed” if the real world has moved on and your database hasn’t caught up. This is a real problem with contact information. It changes constantly. People change jobs, change names, move, change phone carriers, and more. The challenge is keeping your address book up to date with all these changes. Many companies that work with contact information in bulk simply “punt” and apply a simple rule to their data by reducing their confidence in it some percentage every year. I think that’s too heavy-handed and doesn’t work for the end-user. At FullContact, we fundamentally believe that a person’s contact information is current until we find some other, newer, piece of contact information that suggests otherwise. That means that we’re constantly searching the internet for up-to-date information about your contacts.

5. How do you think Full Contact fits into the world of social media and how people are already obtaining contact information? 

For the last couple years, we’ve been seeing the social networks clamp down on their users’ contact information (often for good reason). We remember the spat between Google and Facebook over the ability to export your friends’ information. It’s easy to agree philosophically with elements of both arguments. To Facebook’s point, a person should be in control of her own contact information. To Google’s point, a person should be in control of her contacts, and has a reasonable expectation to get the same data back from a service that she put in. We think FullContact helps bridge this gap. We believe that you own your address book, but we also believe that you have a right to control what information about you is floating around out there on the Internet. We want to you to have the most up-to-date picture of your contacts, but we want to give your contacts control over their own information.

Continue reading