MIR in action: collaborative filtering

Showing posts with label collaborative filtering. Show all posts

Thursday, 18 April 2013

Contributing to GraphChi

GraphChi is a framework for processing large graphs efficiently on a single machine, developed by Aapo Kyrölä of CMU as a spin off from the impressive GraphLab distributed graph processing project. Both GraphLab and GraphChi come with a really handy Collaborative Filtering Toolbox, implementing numerous recent algorithms and developed at CMU by Danny Bickson.

GraphChi looks like a great project so I decided to try to contribute to it and took the chance to implement an algorithm that I'd been wanting to investigate with for a while: Collaborative Less-is-More Filtering, developed by Yue Shi at TU Delft, which won a best paper award at RecSys last year. CLiMF optimises for the Mean Reciprocal Rank of correctly predicted items i.e. it's designed to promote accuracy and diversity in recommendations at the same time. Although it's really intended for binary preference data like follow or friend relations, it's easy to implement a threshold on ratings that automatically binarises them during learning, so CLiMF can also be used with ratings datasets.

Danny made contributing to the toolbox really easy and CLiMF is now available in GraphChi, and documented alongside the other algorithms. I also wrote a simple Python implementation which works fine for small datasets and which was useful for reference.

You can get the latest version of GraphChi and the collaborative filtering toolbox from here.

Thursday, 22 November 2012

Making recommendations - efficiently

Anyone who's even vaguely followed the lively development of recommender systems since the Netflix Prize offered researchers $1M to find a way of improving movie recommendations will have noticed that most of the models proposed follow the same outline:

the input data is a large matrix with a row for each user and a column for each item
entries in the matrix represent ratings or preference scores
the model represents both users and items as vectors in a common low-dimensional space
after training unknown entries in the matrix can be predicted by taking the inner product of the corresponding user and item vectors

If you want to make recommendations for a user, step 4 lets you predict their rating for all possible items. You then simply return the few items with the highest predicted ratings.

Did you notice that one small but significant detail here has been "left as an exercise for the reader"? If you have any significant number of items in your catalogue then predicting and sorting ratings for all of them is going to take rather a long time - far too long for most practical applications.

An elegant paper by Parikshit Ram and Alexander Gray presented at KDD this year offers a simple solution to this problem. Maximum Inner-Product Search using Tree Data-Structure shows how to construct and use a ball tree designed specifically to find vectors maximising the inner product with one or more queries. Ram and Gray start from the observation that this problem is similar but different to that of finding nearest neighbours, the difference being that the inner product is not a genuine distance metric as it doesn't satisfy the triangle inequality.

I've written a very simple Python implementation of their proposed MIP Tree which you can find on github. It's not intended as production code but still shows speedups of several orders of magnitude when I've tried it on real latent factor datasets. Feel free to have a play with it to generate some recommendations - quite efficiently!

Tuesday, 5 October 2010

RecSys 2010: YouTube

The YouTube team had a poster at RecSys descibing their recommender in some detail. The design is intentionally simple, and apparently entirely implemented as a series of MapReduce jobs.

They first compute fixed-length lists of related videos, based in principle on simple co-occurrence counts in a short period i.e. given a video i, the count for a candidate related video j could be the number of users who viewed both i and j within the last 24 hours. The counts are normalised to take into account the relative popularity of different videos, and no doubt massaged in numerous other ways to remove noise and bias. As the paper says, "this is a simplified description".

At recommendation time they build a set of seed videos representing the query user, based on the user's views, favourites, playlists, etc. They then assemble a candidate pool containing the related videos for all of the seeds. If the pool is too small, they expand it by adding the related videos of all the videos already in the pool, though always keeping track of the original seed video for messaging purposes. The candidates in the pool are reranked based on a linear combination of values expressing the popularity of a given candidate video, the importance of its seed to the user, the overall popularity of the candidate and its freshness. Finally the recommended videos are diversified, using simple constraints on the number of recommended videos that can be associated with any one seed, or have been uploaded by any one user. Diversification is particularly important as related videos are typically very tightly associated with their seed.

Precomputed recommendations are cached and served up a few at a time to a user each time they visit the site. Each recommendation is easily associated with an explanation based on its seed video: "recommended because you favorited abc". While this system isn't going to win any best paper prizes it is certainly effective: 60% of all video clicks from the YouTube homepage are for recommendations.

Thursday, 18 February 2010

Integrating social relationships into a music recommender

It's always nice to see people working with Last.fm's huge quantity of publicly available data, especially when they include several researchers from IBM's famous research labs. In their ACM Recommender Systems 2009 paper Augmenting collaborative recommender by fusing explicit social relationships, Q. Yuan, S. Zhao, L. Chen, Y. Liu, S. Ding, X. Zhang and W. Zheng report some experiments incorporating Last.fm's friend and group membership relationships into more conventional Collaborative Filtering systems.

Their first recommender is a user-based CF system i.e. the score given to a new recommendation of Muse for user Jim is a weighted average over all the scores (ratings, scrobbles, etc.) for users who already listen to Muse, and the weights are similarities computed between Jim and each of Muse's fans.

Yuan et al. start with a baseline CF recommender in which the similarity between users is based on cosine distance between their vectors of artist scrobbles. For reasons that aren't obvious to me, the scrobble counts are first mapped to binary values, e.g. 1 if a user has scrobbled Muse three or more times, 0 otherwise. In their first experimental recommender, these similarities are augmented by a weighted combination of similarities based on friend and group membership relationships. The relationship similarities are also computed as cosine distances between binary vectors, so the friendship similarity between two users expresses the extent to which they have friends in common, and likewise for group membership.

The second experimental system is a graph-based recommender, built on a graph of user, artist and group nodes. The graph starts with the obvious edges between users and the artists they listen to, and the groups they belong to. Then edges are added between pairs of users whose relationship similarities are more than some threshold, so each user is connected to other users with similar friends or belonging to similar groups as well as to their 'real' friends. Like the binary CF vectors in the first system, the edge weights are all set to 1. Finally user-user similarity scores are extracted from the graph using a Laplacian matrix pseudoinverse algorithm: in practice the computation required is similar to inverting a matrix containing concatenated artist, friend and group vectors for each user.

After these nice simple suggestions for combining social relationships and CF, the experiments themselves are a bit of a disappointment, mainly because the dataset they use contains only 1000 artists and users. The evaluation metrics are precision, recall and f-value over the top few recommendations, where a recommendation is considered correct if the user actually scrobbled the reommended artist at least three times. In the CF recommender, using weights that seem to be rather highly fitted to the small dataset, there is a 2-3% improvement in f-value over the top five recommendations. The graph recommender shows an 8% improvement, but they don't give baseline results from a graph not including the social relationships, so it's not clear whether this is due to the extra social information or just the different algorithm.

In fact using such a small dataset raises all sorts of problems in interpreting the results:

it's hard to tell if absolute values are good or bad
is it sensible to evaluate the top-5 recommendations from a pool of only 1000 artists?
are the improvements significant?
are the various parameters overfitted?
does the graph method scale?

It makes me suspect that IBM are just playing in the recommender space, while others are clearly serious about it.

MIR in action