Tuesday 5 October 2010

RecSys 2010: YouTube

The YouTube team had a poster at RecSys descibing their recommender in some detail.  The design is intentionally simple, and apparently entirely implemented as a series of MapReduce jobs.

They first compute fixed-length lists of related videos, based in principle on simple co-occurrence counts in a short period i.e. given a video i, the count for a candidate related video j could be the number of users who viewed both i and j within the last 24 hours.  The counts are normalised to take into account the relative popularity of different videos, and no doubt massaged in numerous other ways to remove noise and bias.  As the paper says, "this is a simplified description".

At recommendation time they build a set of seed videos representing the query user, based on the user's views, favourites, playlists, etc.  They then assemble a candidate pool containing the related videos for all of the seeds.  If the pool is too small, they expand it by adding the related videos of all the videos already in the pool, though always keeping track of the original seed video for messaging purposes.  The candidates in the pool are reranked based on a linear combination of values expressing the popularity of a given candidate video, the importance of its seed to the user, the overall popularity of the candidate and its freshness.  Finally the recommended videos are diversified, using simple constraints on the number of recommended videos that can be associated with any one seed, or have been uploaded by any one user.  Diversification is particularly important as related videos are typically very tightly associated with their seed.

Precomputed recommendations are cached and served up a few at a time to a user each time they visit the site.  Each recommendation is easily associated with an explanation based on its seed video: "recommended because you favorited abc".  While this system isn't going to win any best paper prizes it is certainly effective: 60% of all video clicks from the YouTube homepage are for recommendations.

No comments:

Post a Comment