In two previous blog posts we discussed a mixed picture of findings for the relationship between audio quality and real world usage/popularity of audio files on the website Freesound. In one of our Web experiments, Audiobattle, we found that the number of downloads for recordings of birdsong predicted independent ratings of quality reasonably well. In a follow up experiment, however, we found that this effect did not generalise well to other categories of sound – there was almost no relationship between quality ratings and the number of plays or downloads for recordings of thunderstorms or Church bells, for example.
For our next Web test, Qualitube, we reasoned that people might find it easier to compare samples if they were recordings of the same event.
To achieve this we trawled through dozens of Glastonbury Festival videos people had uploaded to YouTube. Our plan was to find pairs of clips of the same performance but filmed at different locations in the crowd.
Imagine two friends, Mick and Keith. Mick smuggled expensive recording equipment into Glastonbury, set it up diligently, and monitored as he recorded Dolly Parton’s Sunday evening performance. Keith meanwhile made a recording of the same performance while lying in a field half a mile from the stage. Halfway through the performance he forgot he was recording started a conversation. Afterwards, Mick and Keith both uploaded their recordings to YouTube for people to watch.
In our experiment we present participants with the same 10 second segment (i.e. an intro or a chorus, etc) of both Mick and Keith’s clips and ask which was better. Once we have ratings for the clips we can explore how they are related to the metadata associated with each clip. This should help us understand if the quality of the audio for each clip is related to its popularity, and give us some insight as to how important audio quality really is to YouTube users. (In our analyses we used all the data publicly available on YouTube; number of plays, number of likes, number of dislikes, how long since the video had been uploaded, number of subscribers to that user, and so on).
Our test set contained more than 300 video clips. More than 3700 comparison ratings were gathered in the Web experiment, and analysis was carried out on the means of all the ratings for each video clip.
The scatterplot below presents the mean quality rating for each YouTube clip versus the (log-transformed) number of Views per day for each clip. We can see, broadly, that as clips become more frequently viewed the ratings for audio quality also increase.
Those with an interest in statistics can see the outcome of a stepwise multi-regression analysis in the table below. Predictor variables included all the metadata we could obtain from YouTube for each clip.
For those less statistically inclined, the regression analyses essentially show that, of all the data we gathered from YouTube, the number of Views per Day (log transformed for a better fit to a linear scale) is the best predictor of ratings for audio quality. This predictor can account for around 36% of the variance in quality ratings. If we wished to be fussy we could include the (log transformed) Number of Likes for the video in our model and account for an additional 1% of the variance. Beyond these two predictors, all the other data from YouTube (number of subscribers, number of dislikes, etc) did not help to predict the audio quality of the clips.
So – to return to our friends Mick and Keith – if you were in a rush and only had time to watch one video of Dolly Parton at Glastonbury, we now have a measure to help you decide which…
Skip to the end…