I’m Alexandre Passant, and I’m co-founder of Music & Data Geeks. We’re a music-tech company based in Dogpatch Labs Dublin, Ireland. In particular, we’re building seevl, a music meta-data API to help music services make sense of their data, provide recommendations to their users, and more. I’ve been working in data and Semantic Web technologies for about 10 years, first through a Ph.D., then as a Research Fellow in DERI, world’s largest “Web 3.0” R&D lab, and now through the start-up.

My goal is to make the Web more open and interconnected, bridging the gap between raw data (webpages) and knowledge (meta-data and structured connections), and then making sense of this data through recommendations, analytics, etc. Combining this with my passion for listening, playing and recoding music is what lead me to starting MDG. I regularly build hacks and run small data experiments , and I recently decided to go through the top-500 songs as ranked by the Rolling Stone magazine. My goal was to identify common patterns and differences between songs, and to see if/how some of them compare. Thus, I worked first on analysing the lyrics, figuring out that some patterns, such as love, regularly come through the songs, then analysing their tempo and loudness, in order to identify which songs were the most dynamic or monotonic. Surprisingly, a few chart hits, like Pretty Woman, were in that category! I have a few other experiments in my pipeline, and I regularly blog about them on my website, while we release new products and hacks on MDG.


Here’s the second post of my data analysis series on the Rolling Stone top 500 greatest songs of all time. While the first one focused on lyrics, this one is all about the acoustic properties of the data-set – especially their volume and tempo.

To do so, I used the EchoNest, which delivers a good understanding of each track at the section level (e.g. verse, chorus, etc.) but also at a deeper “segment” level, providing loudness details about very short intervals (up to less than a second). This is not perfect, due to some issues discussed below, but gives a few interesting insights.

Black leather, knee-hole pants, can’t play no highschool dance

As my goal was to identify relevant tracks from the dataset, in addition to absolute values for the loudness and tempo of each track, I also looked at their standard deviation. If you’re not familiar with it, this helps to identify which songs / artists tend to be closer to their average tempo / loudness, versus the ones that are more dynamic.

Before going through individual songs from the top-500, let’s take an example with the top-10 Spotify tracks of a few artists to check their loudness:

Artist Average Loudness Standard Deviation
Motörhead -5.05 1.29
Ramones -6.85 3.22
Radiohead -11.83 3.20
Daft Punk -11.23 4.82
Public Enemy -5.34 2.30
Beastie Boys -9.38 4.30
Bob Dylan -10.67 2.88
Pink Floyd -16.06 6.50


And the tempo:

Artist Average Loudness Standard Deviation
Motörhead 130.58 33.35
Ramones 175.34 7.69
Radiohead 104.80 28.86
Daft Punk 109.90 11.96
Public Enemy 102.93 13.28
Beastie Boys 108.37 16.85
Bob Dylan 126.60 33.95
Pink Floyd 118.23 25.08


You can see that some bands really deserve their reputation. For instance, while the Pink Floyd have a high standard deviation both in volume and tempo (not surprising), Motörhead is not only the loudest (in average) of the list, but also the one with the smallest standard deviation, meaning most of their tracks evolve around that average loudness. In order words, they play everything loud. While the Ramones and just fast, everything fast. And when they’re together on stage, the result is not surprising

But you don’t really care for music, do you?

Coming back to the top 500, I ran the Echonest analysis on 474 tracks of the list. The 26 missing are due to various errors at different stages of the full pipeline.

On the one hand, I’ve used raw results from the song API to get the average values. I had to consolidate the data by aggregating multiple API results together. For a single song, multiple tracks are returned by the API (as expected), but there can be large inconsistencies between them. For instance, if you search for American Idiot, one track (ID=SOHDHEA1391229C0EF) is identified having a tempo of 93, the other one (SOCVQDB129F08211FC) of 186. Some can also have slighter variations (in volume for instance, between a live and the original version). To simplify things – and I agree it include a bias in the results – I averaged the first 3 results from the API.

On the other hand, I relied on NumPy to compute the standard deviation from the first API result, removing first the fade-in and fade-out of each track. Here, I’ve also skipped every segment of section where the API confidence was too low (< 0.4).

The average loudness for the dataset is -10.38 dB. Paul Lamere run an analysis of 1500 tracks a few years ago, with an average of -9.5 dB so we can see that this dataset is not too far from a “random” sample – check the conclusion of this post to understand why the Echonest’s loudness is less than 0.

Going through individual tracks, here are the loudest tracks from the list:

And the quietest ones:

You can clearly see the dB difference between a loud (CCR) and quiet (Jeff Buckley) track on following plots.

Rolling Stone Music Big Data Analysis Creedence Clearwater Revival – Who’ll Stop the Rain

Rolling Stone Music Big Data Analysis 2 Jeff Buckley – Hallelujah (a low-level but very dynamic track)

Looking at the standard deviations, here are now the most dynamic, volume-wise, tracks.

This last one is a beautiful example of a soul song with a dynamic volume range, and here’s a live version below.

On the other side of the spectrum, here are the less dynamic tracks – i.e. the ones with the smallest standard deviation, volume-wise:

The Ramones strike again – but I’m not sure that Highway to Hell is actually so linear – even though the 2nd part definitely is!

Rolling Stone Music Big Data Analysis 3 AC/DC – Highway to Hell

Please could you stop the noise, I’m trying to get some rest

Going away from the loudness and focusing on the tempo, here are the fastest tracks (in average BpM) of the list (some seem a bit awkward here):

And the slowest ones, also including the Stones:

But I believe that once again, it’s interested to look at how dynamic the tracks can be, with the most dynamic ones (tempo-wise):

And the most static ones, i.e. the ones with less tempo variation:

If you’ve ever looked at the Man vs Machine app, you might find fun that even though the less dynamic (or the more consistent, depending how you look at it) one is using samples (Run DMC), all other involved drummers. Don’t forget to thank the best backing band ever for the perfect tempo on Marvin Gaye’s track (and I couldn’t resist sharing their own cover of the song).

I’m waiting for that final moment you say the words that I can’t say

Last but not least, I’ve normalized and combined both the tempo deviation and the rhythm one to assign a [0:1] score to each track in order find the most and less dynamic tracks overall. Here’s the top-5 of the most dynamic ones:

If you listen to My Generation, you can clearly hear the dynamic both in tempo and volume with the different bursts of the song. While the Radiohead one is more on the long-run, with clearly distinct phases as shown below for the volume part.

Rolling Stone Music Big Data Analysis 4 Radiohead – Paranoid Android

Finally, here are the less dynamic ones. Several ones on that list made it through the charts, showing that even though a song can be pretty flat in both volume and tempo, it can still be a hit – or at least an earworm:


Alexandre Passant Alexandre Passant is the co-founder of Music and Data Geeks, a music-tech company based in Dublin. Music and Data Geek’s chief product is Seevl.fm, a music meta-data API to help music services make sense of their data, provide recommendations to their users, and more. He has 10 years’ experience in data and Semantic Web technologies.

His personal blog can be found here.


 

(Featured image credit: Didier DDD)

Previous post

New Jersey Pushes Big Data Bill to Enhance Capabilities as a leader in Innovation

Next post

Peaxy Hyperfiler: A Data Management Tool to Lift Your Data Out of the Disruptive Tech Refresh Cycle