Data Story: Michele Trevisiol on Image Analysis

Social media content is frequently shifting to a visual medium and companies are often having a harder time understanding and digesting visual content. So I loved happening upon Michele Trevisiol, PhD student at Universitat Pompeu Fabra and PhD intern at Yahoo Labs Barcelona, whose research focused on image analysis for videos and photos. 

Michele Trevisol

1. Your research has always revolved around analyzing videos and photos, and this is an area that the rest of the world is struggling how to figure out. Where do you see the future of research around this heading?

For my own experience I can see a huge work for research in the near future. Every day there are tons of new photos and videos uploaded online. Just think about Facebook, Flickr, YouTube, Vimeo and many others new services. Actually, in one of our projects we are working with Vine, a short-video app that allows you to record 6 seconds in loop. Twitter has bought it in January and Vine reached already 40 million of users.

Just to write some numbers, recent studies estimated a volume of about 880 billion photos will be uploaded in 2014, without considering other multimedia data. This volume of information makes it very hard for the user to explore such a large multimedia space and pick the most interesting items. Therefore, we need smart and efficient algorithms to fix this. The amount of data is growing every day, researchers needs to keep improving their systems to analyze, understand and rank the media items in the best way as possible (often in a personalized way for the user).

Researchers have studied these topics from many different angles. Working with multimedia objects involves analyzing the content of the data (i.e., computer vision like object detection, pattern recognition, etc.), understanding the textual information (e.g., meta-data, description, tags, comments), or studying how media is shared in the social network. This is a research space that has still many things left to explore.

2. You’ve previously researched how to determine geo data from video based on a multi-modal approach based on how videos are tagged, user networks, and other meta-data. What are the advantages of understanding the geo location of videos?

You can see the problem from a different point of view. If you have a complete knowledge about the multimedia items you have, like the visual content, the meta-data, the location, or even how they are made (technical details), and so on: this data would be easily classify and discoverable for the users. All the information about the item helps researchers to understand its properties and its importance for the users that is looking for it. However, very often this information is missing, incomplete or even wrong. In particular the geo location is not provided on the vast majority of photos and videos online.

Only in the recent years there has been an increment of cameras and phones with automatic geo-tagging (able to record the latitude and longitude where the photo was taken). As a result, just few multimedia items have this information. Being able to discover the geo location of videos/images helps you to organize and classify them, and helps the users to find items related to any specific location, improving their search and the retrieval. We presented a multi-modal approach that keeps into account various factors related to the given video, like the content, the textual information, the user’s profile, the user’s social network and her previous uploaded photos. With such information the accuracy of the geo location prediction improved dramatically.

Recently, the research is spending more effort on this topic, mainly due to the increasing interest in the mobile world. In the near future, the activity in this area is destined to increase.

3. How are the browsing patterns different for people viewing photos than text? What motivates people to click on different photos?

The browsing patterns are strongly biased by the structure of the website and, of course, by the type of website. The first case is quite obvious as the users browse the links that are more evident on the page, therefore the way the website selects and shows the related items is really important. The latter case instead is related to the type of website.

Consider, for example, Wikipedia.org. Many users land on the website from some search engine and read just one article. This means that the goal of the user is very focused, as she’s able to define the query, click on the right link, consume the information, and leave. But that’s not always the case, as there are also users who browse deep and look at many articles related to one topic (e.g., about TV series, episodes, actors, etc.). If you consider News websites instead, the behavior is different as the user could enter only to spend some time, to take a break, to get distracted with something interesting. A photo sharing website presents even different behavior, often characterized by the social network structure. Many users interact mostly with the photos shared by friends, or contacts, or they like to get involved in social groups trying to get more visibility and positive feedbacks as possible.

The main interest of any service provider is to keep the user engaged long as possible on its pages. To do this, it shows the links with the highest interest for the users to keep them clicking and browsing. That’s what the user wants as well, she wants to find something interesting for her needs. The rationale is similar for photo sharing sites, but the power of the image to catch the interest at the first glance is an important difference. For example, in Flickr there are “photostreams” (sets of photos) shown to the user for each image she is watching. Slideshows show image thumbnails in order to catch the interest of the user with the content of the recommended images. Recently, we developed a study on these specific slideshows, we found that the users love to navigate from one slideshow to another instead of searching directly for images or browsing specific groups. We also tried to recommend different slideshows instead than different photos with positive and interesting results.

4. Much of your research has focused around Flickr. How does data science improve the user experience of Flickr?

Recently Flickr has improved a lot in this direction, for example with the new free accounts with 1TB of free storage, or the interface that has been recently refreshed. But the changes are just at the beginning.

In general, the data scientists need first to study and understand how the users are engaging with the websites, how they are searching and consuming the information, how they are socializing, and especially what they would like to have and to find on the website. In order to improve the user experience you need to know the users and then to work on what (and how) the website can offer to improve their navigation.

5. Based on your research around photo recommendations, what characteristics make photos most appealing to viewers?

This is a complex question as the appealing is subjective and changes for each user, especially the taste, or even better, the interest of the user changes over time. Some days you’re looking for something more funny so maybe the aesthetics of the image are less important. Other days instead, you get captured by very cool images that can be professional or just incredible amateur shots.

In the majority of cases each user is quite unique in term of taste, so you need to know what she appreciated before and how her taste changed over time in order to show her the photos that she could like more. On the other hand, there are cases that can catch the interests of any users in an objective way. For example, in photos related to real world events the content is highly informative, instead, the quality and the aesthetic are often ignored.

In a research work that we presented last year, we compare different ranking approaches in function of various factors. One of these was the so called external impact. With this features we could measure how much interest the image has outside Flickr, in other words, how much visibility the image has on the Web. If an image uploaded by a Flickr user in her page has a huge set of visits coming from outside (e.g., other social network, search engine), it means this image has high attractiveness that need to be considered even if inside the network it does not show particular popularity. We found that this could also be a relevant factor to be considered in the ranking, and we are still investigating this point.

If you’re interested in more data stories, please check out our collection of 25 Data Stories for interviews with data scientists from Pinterest, Foursquare, and more!