Amazon Mechanical Turk seems to be gaining some ground as a way of collecting experimental groundtruth. A workshop on Creating Speech and Language Data with Amazon Mechanical Turk recently challenged researchers to come up with useful datasets for an outlay of $100, resulting in over 20 new datasets. A review paper by Chris Callison-Burch and Mark Dredze summarises some of the lessons learned.
Now Jin Ha Lee of the University of Washington has compared crowdsourced music similarity judgements from Mechanical Turk with those provided by experts for the carefully controlled MIREX Audio Music Similarity task. It took only 12 hours to crowdsource judgements compared to two weeks to gather them from experts, and, provided that the tasks were carefully designed, the results were almost identical: only 6 out of 105 pairwise comparisons between algorithms would have been reversed by using the crowdsourced judgements.
another presentation Mike Mandel showed how social tags for short audio clips sourced from Mechanical Turk could be used for autotagging. He uses a Conditional restricted Boltzmann machine to model the cooccurrence of tags between different clips, with inputs representing the user who applied each tag, as well as the track, album and artist from which each clip was drawn. The learned weights are used to create a smoothed tag cloud from the tags provided for a clip by several users. When used as input to his autotagging classifiers, these smoothed tag clouds gave better results than simply aggregating tags across users, approaching the performance of classifiers trained on tags from the much more controlled MajorMiner game.