Wednesday, 11 August 2010

ISMIR 2010: music transcription

The Music Information Retrieval community is a pretty broad church, but the very much unsolved problem of transcribing a full score from audio input remains at its heart. Nowadays this is approached via various somewhat simpler subtasks, such as capturing just the harmony of the music as a series of chord labels. Most existing approaches to this use the chroma feature as the main input to their labelling algorithm. Chroma purports to show the amount of energy associated with each pitch class (note of the scale) in a given short time period, but it's well known to be highly flawed: if you resynthesize the pitches implied by the chroma it usually sounds nothing like the original audio.

A convincing presentation by Matthias Mauch, currently working in Masataka Goto's AIST lab in Japan, showed how you can improve on existing chord labelling performance by using the output of a multipitch transcription algorithm instead chroma. Mauch's labeller is fairly sophisticated (a Dynamic Bayes Net with a number of inputs as well as pitch evidence), but his pitch transcription algorithm is a very simple best-fit effort: expect a flurry of papers seeing if fancy state of the art methods can work even better than the current 80% accuracy on the MIREX test set of songs by The Beatles and others.

Two posters dealt with the misleadingly-named "octave error" problem in estimating the tempo (bpm) of music from audio input: state of the art beat trackers are good at finding regular pulses, but they often have trouble telling the beat from the half-beat. I really liked Jason Hockman's approach to solving this. Instead of making any changes to his beat tracker, he simply trains a classifier to predict whether a given track is slow or fast, using training examples which Last.fm users have tagged as slow or fast. Despite using just a single aggregate feature vector for each track, which doesn't obviously contain any tempo information, his slow-fast classifier works really well, with an accuracy of over 95%. Presumably slow songs simply sound more like other slow songs than fast ones. I'll be interested to see how much this can help in tempo estimation. I'd expect that the classifier wouldn't work so well with songs that aren't obviously slow or fast enough to have been tagged as either, but I'd also guess that moderate tempo songs are less affected by octave error in the first place.

No comments:

Post a Comment