Tuesday 5 October 2010

RecSys 2010: music

How do teenagers learn about new music?  Audrey LaPlante has spent some time actually asking them, and presented some of their answers at the Workshop on Music Recommendation and Discovery.


Some of her findings were extremely interesting.  All of the teenagers she spoke to said that their tastes had changed substantially over time, and that the changes were due to changes in their social network.  Most had music geek friends whom they actively consulted for music recommendations, even though they were not influential people in other respects.  Although close contacts were most likely to be sources of new music, those chosen to play that role were "almost always those whose social network were more different from theirs, mostly those who were going to a different school".

I'll be interested to follow Audrey's research and see how we can learn to make online social networks an equally great place for young people to discover new music.

RecSys 2010: YouTube

The YouTube team had a poster at RecSys descibing their recommender in some detail.  The design is intentionally simple, and apparently entirely implemented as a series of MapReduce jobs.

They first compute fixed-length lists of related videos, based in principle on simple co-occurrence counts in a short period i.e. given a video i, the count for a candidate related video j could be the number of users who viewed both i and j within the last 24 hours.  The counts are normalised to take into account the relative popularity of different videos, and no doubt massaged in numerous other ways to remove noise and bias.  As the paper says, "this is a simplified description".

At recommendation time they build a set of seed videos representing the query user, based on the user's views, favourites, playlists, etc.  They then assemble a candidate pool containing the related videos for all of the seeds.  If the pool is too small, they expand it by adding the related videos of all the videos already in the pool, though always keeping track of the original seed video for messaging purposes.  The candidates in the pool are reranked based on a linear combination of values expressing the popularity of a given candidate video, the importance of its seed to the user, the overall popularity of the candidate and its freshness.  Finally the recommended videos are diversified, using simple constraints on the number of recommended videos that can be associated with any one seed, or have been uploaded by any one user.  Diversification is particularly important as related videos are typically very tightly associated with their seed.

Precomputed recommendations are cached and served up a few at a time to a user each time they visit the site.  Each recommendation is easily associated with an explanation based on its seed video: "recommended because you favorited abc".  While this system isn't going to win any best paper prizes it is certainly effective: 60% of all video clicks from the YouTube homepage are for recommendations.

RecSys 2010: social recommendations

Mohsen Jamali won the best paper award at RecSys for his presentation on SocialMF, a model-based recommender designed to improve ratings-based recommendations for users who have made few ratings but who have friends, or friends of friends, who have provided plenty.  Mohsen's previous solution to the same problem was TrustWalker.  TrustWalker predicts ratings from a set of item-item similarities and a social graph, where nodes are users and edges represent trust or friendship relationships.  The rating for item i for user u is predicted by taking a short random walk on the graph, stopping at some friend-of-a-friend v and returning v's rating for item j, where j is the most similar item to i which v has rated.  Closed form expressions for these predictions don't scale at all well, so to make a prediction TrustWalker actually executes the random walk a few times and returns the average rating.  On a test set of ratings by 50k users for 100k product reviews from the Epinions website, TrustWalker does indeed show a significant benefit in both coverage and prediction accuracy for cold start users over baseline methods that don't leverage the social graph.


SocialMF is a Bayesian model-based solution to the same problem: latent factors for all users and items are learned jointly from ratings and the social graph.  Ratings for cold start users can then be predicted from their learned factors.  When tested on the epinions dataset, and a new one of ratings by 1M users for 50k movies crawled from Flixster, SocialMF again does indeed improve the accuracy of predicted ratings for cold start users over latent factor models that don't take the social graph into account.

The model-based approach is elegant and perhaps even scalable: learning time is linear in the number of users, and the paper reports a runtime of 5.5 hours for 1M users.  But it lacks the powerful explanations of the simpler system: "recommended for you because your friend xyz likes it".