Thursday, 18 April 2013

Contributing to GraphChi

GraphChi is a framework for processing large graphs efficiently on a single machine, developed by Aapo Kyrölä of CMU as a spin off from the impressive GraphLab distributed graph processing project.  Both GraphLab and GraphChi come with a really handy Collaborative Filtering Toolbox, implementing numerous recent algorithms and developed at CMU by Danny Bickson.

GraphChi looks like a great project so I decided to try to contribute to it and took the chance to implement an algorithm that I'd been wanting to investigate with for a while: Collaborative Less-is-More Filtering, developed by Yue Shi at TU Delft, which won a best paper award at RecSys last year.  CLiMF optimises for the Mean Reciprocal Rank of correctly predicted items i.e. it's designed to promote accuracy and diversity in recommendations at the same time.  Although it's really intended for binary preference data like follow or friend relations, it's easy to implement a threshold on ratings that automatically binarises them during learning, so CLiMF can also be used with ratings datasets.

Danny made contributing to the toolbox really easy and CLiMF is now available in GraphChi, and documented alongside the other algorithms.  I also wrote a simple Python implementation which works fine for small datasets and which was useful for reference.

You can get the latest version of GraphChi and the collaborative filtering toolbox from here.