Isn’t OP analyzing frequencies of individual chords, not chord progressions?
Analyzing individual chords involves counting the frequency of each chord (such as G, C, or D).
Analyzing chord progressions would involve counting the frequency of chord pairs (such as D—A or C—G), chord triplets (such as D—A—Bm or C—G—Am), or longer sequences of chords. For an alternative look at the data, you could also normalize chord progressions across key signatures for your analysis (D—A or C—G would both normalize as I—V, D—A—Bm or C—G—Am would both normalize as I—V—vi).
> Isn’t OP analyzing frequencies of individual chords, not chord progressions?
Not according to the other comments, which say that the data set strips chords that follow identical chords, as if "too" was one of the most common words in written English.
The raw chord data is at https://huggingface.co/datasets/ailsntua/Chordonomicon/tree/.... It consists of one row per song containing a list of chord names in song order (no timing information) and Spotify ids for track and artist. It seems like Spotify has a different id for every released version, so it's really hard to search for particular songs in the data.
To normalise across key signatures you need to know what key the song is in (at each point), and the data doesn't contain that. For many genres it could be guessed reasonably accurately from the chords.
i know. I was so disappointed reading that article. I had gone in expecting an analysis of progressions. e.g. VI-IV-I-V
instead I got a page of chords analysis.
Analyzing individual chords involves counting the frequency of each chord (such as G, C, or D).
Analyzing chord progressions would involve counting the frequency of chord pairs (such as D—A or C—G), chord triplets (such as D—A—Bm or C—G—Am), or longer sequences of chords. For an alternative look at the data, you could also normalize chord progressions across key signatures for your analysis (D—A or C—G would both normalize as I—V, D—A—Bm or C—G—Am would both normalize as I—V—vi).