Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Sex and Drugs and Rock n’ Roll: Analysing the Lyrics of The Rolling Stone 500 Greatest Songs of All Time

byAlexandre Passant
August 8, 2014
in Articles
Home Resources Articles

I’m Alexandre Passant, and I’m co-founder of Music & Data Geeks. We’re a music-tech company based in Dogpatch Labs Dublin, Ireland. In particular, we’re building seevl, a music meta-data API to help music services make sense of their data, provide recommendations to their users, and more. I’ve been working in data and Semantic Web technologies for about 10 years, first through a Ph.D., then as a Research Fellow in DERI, world’s largest “Web 3.0” R&D lab, and now through the start-up.

My goal is to make the Web more open and interconnected, bridging the gap between raw data (webpages) and knowledge (meta-data and structured connections), and then making sense of this data through recommendations, analytics, etc. Combining this with my passion for listening, playing and recoding music is what lead me to starting MDG. I regularly build hacks and run small data experiments , and I recently decided to go through the top-500 songs as ranked by the Rolling Stone magazine. My goal was to identify common patterns and differences between songs, and to see if/how some of them compare. Thus, I worked first on analysing the lyrics, figuring out that some patterns, such as love, regularly come through the songs, then analysing their tempo and loudness, in order to identify which songs were the most dynamic or monotonic. Surprisingly, a few chart hits, like Pretty Woman, were in that category! I have a few other experiments in my pipeline, and I regularly blog about them on my website, while we release new products and hacks on MDG.


I was reading the Wikipedia entry for the Rolling Stone’s 500 Greatest Songs of All Time, and while it contains a lot of interesting statistics (shortest and longest songs, decades, covers, etc.), I’ve decided to do some “API-based data-science” and see what insights we can learn from this top-500.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Sex and Drugs and Rock'n'Roll Analysing the Lyrics of The Rolling Stone 500 Greatest Songs of All Time
#15: London Calling – One of the 5 songs from the Clash in the top 500

I’ll split this into multiple posts, in order to showcase how different APIs bring multiple perspectives to the data-set, such as acoustic features with the Echo Nest, mood recognition with Gracenote, or artist and genre data with seevl (disclosure – I’m the main responsible for this one).

Here’s the first one, investigating lyrics from those top-500 songs. The first part is rather technical, so if you’re interested only in the insights, just skip it. And here’s an accompanying playlist, featuring the songs mentioned in this post – all from the top 500, except the opening one.

Come Together

Before going through the insights, here’s the process I used to gather the data:

  • Scrape names, artists and reviews of the 500 songs using python‘s urllib and BeautifulSoup, starting from the 500th one: “Shop Around“;
  • Get lyrics of each songs via the Lyrics’n’Music API (powered by LyricFind) with additional scraping, as unfortunately the API returns only a few sentences (as does the musiXmatch one, both for copyright reasons);
  • Run some NLP tasks on the corpus with nltk: tokenize the lyrics (i.e. split lyrics into words), apply stemming for normalisation (i.e. extract the words roots, e.g. “love”, “loved” and “loving” all map to “love”), and extract n-gram (i.e. sequence of words, here using n from 3 to 5) for some tasks described below.

Regarding that last step, I’ve used the PunktWordTokenizer, which gave better results than the default word_tokenize. As most of the lyrics are in English and the Punkt tokenizer is already trained for it, no additional work was required. Stemming was done with the Snowball algorithm – more about it below. Here’s a quick snippet of how it works.

[code language=”css”]from nltk.tokenize.punkt import PunktWordTokenizer
from nltk.stem.snowball import SnowballStemmer

elvis = """
Here we go again
Asking where I’ve been
You can’t see these tears are real
I’m crying (Yes I’m crying)
"""

sb = SnowballStemmer(‘english’)
pk = PunktWordTokenizer()

print [sb.stem(w) for w in pk.word_tokenize(elvis)][/code]

Leading to:

[code language=”python”][‘here’, ‘we’, ‘go’, ‘again’, ‘ask’, ‘where’, ‘i’, "’ve", ‘been’, \
‘you’, ‘can’, "’t", ‘see’, ‘these’, ‘tear’, ‘are’, ‘real’, ‘i’, \
"’m", ‘cri’, ‘(‘, ‘ye’, ‘i’, "’m", ‘cri’, ‘)’][/code]

As you can see, there are a few issues: “me” is stemmed to “m”, and “crying” to “cri” and not to “cry” – as one could expect. Yet, “cried”, “cry”, “cries” are all stemmed to this same root with Snowball, which is OK in order to group words together. However, no stemming algorithm is perfect. Snowball identified different roots for “love” and “lover”, while the Lancaster algorithm matched both to “lov”, but fails for the previous cry example.

[code language=”python”] >>> from nltk.stem.snowball import SnowballStemmer
>>> from nltk.stem.lancaster import LancasterStemmer
>>>
>>> sb = SnowballStemmer(‘english’)
>>> lc = LancasterStemmer()
>>>
>>> cry = [‘cry’, ‘crying’, ‘cries’, ‘cried’]
>>> [lc.stem(w) for w in cry]
[‘cry’, ‘cry’, ‘cri’, ‘cri’]
>>> [sb.stem(w) for w in cry]
[u’cri’, u’cri’, u’cri’, u’cri’]
>>>
>>> love = [‘love’, ‘loves’, ‘loving’, ‘loved’, ‘lover’]
>>> [lc.stem(w) for w in love ]
[‘lov’, ‘lov’, ‘lov’, ‘lov’, ‘lov’]
>>> [sb.stem(w) for w in love ]
[u’love’, u’love’, u’love’, u’love’, u’lover’]
[/code]

That being said, on the full corpus, the top-10 stems were the same whatever the algorithm was (albeit a different count and different syntaxes). Hence, I’ll report on the Snowball extraction in the remainder of this post.

Baby Love

So, it appears that the most popular word variation in the corpus is “love”. It’s mentioned 1057 times in 219 songs (43.8%), followed by:

  • “I’m”: 1000 times, 242 songs
  • “oh”: 847 times, 180 songs
  • “know”: 779 times, 271 songs
  • “baby”: 746 times, 163 songs
  • “got”: 702 times, 182 songs
  • “yeah”: 656 times, 155 songs

One could probably write lyrics with “Oh yeah baby I got you, yeah I’m in love with you, yeah!” and easily fits here (well, look at that opening line). Sorting by song ranking also brings “like” in the top list, included in 194 of those top-500 songs.

I Wanna Be Anarchy

Looking at the top-5 3-grams and we still have a sense of a general “you-and-me” feeling that occur in those songs:

  • “I want to”: 38 songs
  • “I don’t know”: 35 songs
  • “I love you”: 26 songs
  • “You know I”: 22 songs
  • “You want to”: 21 songs

Followed by other want / don’t want combinations. Once again, most of the want-list is love-related. While some want to hold her hand, know if she loved them or simply know what love is, other prefer to be your dog, while some just want to be free.

There was no real pattern on the 4-grams and 5-grams, besides that Blondie, Jimmy Hendrix and 7 others “don’t know why”, and that the B-52’s, Bob Dylan and Jay-Z have something to do on “the other side of the road”.

Hotel California

As a short-list compiled by a rock magazine, you could expect a few tracks falling under the sex, drugs and rock-n-roll stereotype. Well, not really. On the top-500, only 13 songs contain the word sex, 5 drug and 4 rock’n’roll, none of them combining all.

Looking deeper into the drug-theme, and using a Freebase query to a list of abused substances and their aliases, we find 7 occurrences for cocaine and 4 for heroin – three times for the first one in the eponym song, while grass and pot appear a few times, even though it would require more analysis to see in which context they’re used. Of course, a simple token analysis like this one could not capture the full songs messages, and we miss classics like the awesome Comfortably Numb or White Rabbit by Jefferson Airplane.
Sex and Drugs and Rock'n'Roll Analysing the Lyrics of The Rolling Stone 500 Greatest Songs of All Time 2

Querying Freebase to find drugs and their aliases
The more details about drugs in this top-500 are in the review themselves – often including background stories about the song. Heroin it mentioned 11 times, acid 3, alcohol 3, and cocaine twice.

Good Vibrations

Last but not least, I’ve used AlchemyAPI for topic extraction and sentiment analysis. Nothing very relevant came up from the entity extraction phase, but here are the most negative songs from the list according to their sentiment analysis module.

  • “Ain’t It a Shame” by Fats Domino (-0.71)
  • “Why Do Fools Fall In Love” by Frankie Lymon and The Teenagers (-0.56)
  • “The Girl Can’t Help It” by Little Richard (-0.54)
  • “Monkey Gone to Heaven” by the Pixies (-0.54)
  • “I Can’t Make You Love Me” by Bonnie Raitt (-0.51)

And the most positive ones

  • “Can’t Buy Me Love” by The Beatles (0.67)
  • “Everyday” by Buddy Holly and the Crickets (0.63)
  • “All Shook Up” by Elvis Presley (0.59)
  • “Love and Happiness” by Al Green (0.58)
  • “Miss You” by The Rolling Stones (0.58)

For both, it seems there’s a clear bias towards the words used in the song (e.g. “shame” or “love”), rather than extracting sentiments from the proper song’s meaning. It would be more interesting to use a data-set from SongMeanings or Songfacts to run a proper analysis – this might be for another post.

That’s it for today!


Alexandre Passant Alexandre Passant is the co-founder of Music and Data Geeks, a music-tech company based in Dublin. Music and Data Geek’s chief product is Seevl.fm, a music meta-data API to help music services make sense of their data, provide recommendations to their users, and more. He has 10 years’ experience in data and Semantic Web technologies.

His personal blog can be found here.


 

Follow @DataconomyMedia

(Featured image credit: Rolling Stone)

 

Tags: surveillanceWeekly Newsletter

Related Posts

Digital inheritance technology by Glenn Devitt addresses the $19T asset transfer problem

September 5, 2025
Earn Stable Crypto Passive Income in 2025 with 5 Best AI Crypto Coin Staking Cloud Mining Platforms

Earn Stable Crypto Passive Income in 2025 with 5 Best AI Crypto Coin Staking Cloud Mining Platforms

September 4, 2025
Why BPM tools are essential for the future of Business Process Automation

Why BPM tools are essential for the future of Business Process Automation

September 3, 2025
Top Model Context Protocol tools and platforms in 2025

Top Model Context Protocol tools and platforms in 2025

September 3, 2025
When Regulation Embraces Innovation: Xenco Medical Founder and CEO Jason Haider Discusses the Upcoming 2026 CMS Transforming Episode Accountability Model

When Regulation Embraces Innovation: Xenco Medical Founder and CEO Jason Haider Discusses the Upcoming 2026 CMS Transforming Episode Accountability Model

August 26, 2025
DeFAI and the Future of AI Agents

DeFAI and the Future of AI Agents

July 26, 2025
Please login to join discussion

LATEST NEWS

Spotify Premium to add 24-bit FLAC lossless audio

Bending Spoons to acquire Vimeo for $1.38 billion

Nintendo Direct September 2025: What’s coming for Nintendo Switch and Switch 2?

China develops SpikingBrain1.0, a brain-inspired AI model

TwinMind raises $5.7M to launch AI second brain for offline note-taking

YouTube Music tests lyrics paywall for free users

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.