Tuesday, 1 November 2011

ISMIR 2011: data, data and more data

This year's ISMIR conference saw some powerful critiques of current evaluation datasets for MIR tasks, but also some great new releases of data that should help us start to do a better job.

First the criticisms. Julian Urbano pointed out a widening gap in sophistication between the annual MIREX algorithm bake-off and more established equivalents such as TREC for document and image search. For anyone not completely persuaded by his arguments, Fabien Gouyon and his team demonstrated convincingly that current music autotagging algorithms fail to generalise from one dataset to another i.e. at the moment there is no hard evidence that they are really learning anything at all: the main cause is probably that the available reference datasets are simply too small.

And now the data:

Wow that's a lot of new data. Time to get down to some algorithm development!


  1. There is also MusiCLEF:

  2. Thanks Nicola, would be nice to list those too but following the links for the datasets leads me to a registration page that's now closed! If you can tell me where the datasets are available then I'll add them to the list.