Monday, 16 August 2010

ISMIR 2010: code

The Best Paper award went to Ron Weiss and Juan Bello of NYU for their work on finding repeating patterns in musical audio with Probabilistic Latent Component Analysis (aka Convolutive Non-Negative Matrix Factorisation) applied to chroma features. This finds a best fit decomposition of the features considered as a linear mixture of some number of latent components convolved with corresponding activation weights that vary over time during the course of the track. Weiss and Bello apply priors to enforce sparsity in the activations, so that only a small number of components are active at any given time. Better still this means that you can start with a large number of latent components, and the activations for most of them drop to zero throughout the track as the learning algorithm proceeds, meaning that the effective number of underlying components can be learned rather than having to be specified in advance. Finally you can estimate the high-level repeat structure of the track by picking a simple viterbi path through the activations. I thought the paper was well worth the prize: an original approach, convincing results, clear presentation at the conference, and, best of all, published python code at http://ronw.github.com/siplca-segmentation.

Ron Weiss also had a hand in the appealing Gordon Music Collection Database Management System available from http://bitbucket.org/ronw/gordon. This looks like a great lightweight framework for managing experiments based on audio feature extraction, with a python api, support for sql-like queries, automatic feature caching, and a clean web interface which includes automatic best-effort visualisations of the features for each track. I'm really looking forward to trying it out. The name Gordon apparently refers to the character in Nick Hornby's novel High Fidelity.

Opinion is increasingly divided about the pros and cons of using Flash on your website. If you still love Flash, then the Audio Processing Library for Flash, available from http://code.google.com/p/alf, now lets you do audio feature extraction in realtime, directly from your Flash movies. The developers even have a suprisingly funky example game, in which the graphics, and even some of the gameplay, are based directly on features extracted from the soundtrack as it plays: presumably this is one game which MIR researchers will always win!

No comments:

Post a Comment