In an earlier blog we wrote about a Web experiment where we asked participants to compare and rate sounds tagged with “bird song” (or “birdsong”) on Freesound.org. We then compared the quality ratings we had obtained with the Freesound metadata for each sample (such as average rating, how many downloads, etc). We found that 33% of the variance in quality ratings could be explained by the number of downloads per day of the sounds. An interesting finding – it hinted towards a rough-and-ready method for quickly sorting sets of audio into good and poor audio quality.
So what’s new?
To explore the idea (of an indirect predictor of quality) further we thought we would repeat the experiment across other categories of sound and see how well the findings generalised. We went back to Freesound and got hold of every sound with the tags “Car Engine”, “Church Bells”, “Crowd Noise”, “River”, and “Thunderstorm”. We then selected a subset of 20 sounds from each of these categories to use in our experiment. This second phase of the Audiobattle experiment followed exactly the same format as the earlier Birdsong version – participants were presented with pairs of sounds and asked to rate which was better and by how much (on a 7-point scale). Each pair of sounds was only ever from within the same category (i.e. a Car Engine sound was never compared with a Thunderstorm sound, for example).
How many people took part?
Our final data set for this phase of the experiment consisted of nearly 9,500 comparisons. This meant that, on average, each sound in the experiment was involved in just under 200 comparisons.
So, can we predict quality from number of downloads?
Well…possibly not. The figure below presents scatterplots of downloads per day Vs Quality ratings for each sound in each category.
Of the 5 categories we considered in this experiment only Car Engine sounds showed any significant relationship between quality and downloads per day. In this instance, the effect was of a similar magnitude to that found earlier for Birdsong; around 27% of variance in quality scores could be explained by the downloads per day data. Of the other 4 categories however, the number of downloads per day showed no predictive value for ratings of quality. Furthermore, the addition of the average ratings on Freesound and/or the absolute number of downloads (uncorrected for time since upload) to the regression models gave no significant benefit to the predictive value.
As always, we’d love to hear your thoughts. How surprised should we be if real-world use of sound samples seems independent of the quality of the audio? How important is context to our understanding of what is good quality?
If you use/download audio samples, how do you select one sound from many possible alternatives? How important is the feedback of other users to you when choosing sounds – are you more likely to listen to/download a sample if hundreds of others already have too?
Skip to the end…