MuSe : The Musical Sentiment Dataset

The MuSe (Music Sentiment) dataset contains sentiment information for 90,001 songs. We computed scores for the affective dimensions of valence, dominance, and arousal, based on the user-generated tags that are available for each song via Last.fm. In addition, we provide artist, title and genre metadata, and a MusicBrainz ID and a Spotify ID, which allow researchers to extend the dataset with further metadata. CHRISTOPHER AKIKI


(2) METHOD
This section provides an overview of the basic steps and resources that were involved in creating the dataset; for more details on each of the steps, see Akiki and Burghardt (2020). STEPS 1. Seeding: In this step, we used 279 mood labels from AllMusic (https://www.allmusic.com/ moods, last scraped Sep. 2019) as seeds to collect song objects from the Last.fm API (https://www.last.fm/api). The AllMusic mood labels that were used are documented in a "seeds" column in the dataset.

Expansion:
For each of the 279 seed moods, we collected up to 1,000 songs, which is currently the official limit of the Last.fm API. As we did not retrieve the maximum of 1,000 songs for each of the seed labels, we ended up with a total of 96,499 songs (see Figure 1).

3.
Filtering: Next, we filter the more than 261k unique user-generated tags by using the WordNet-Affect list (Strapparava & Valitutti 2004). This step is inspired by related work from Hu, Downie, andEhmann (2009) andDelbouys, Hennequin, Piccoli, Royo-Letelier, andMoussallam (2018), and leaves us with a list of songs that contain a least one moodrelated tag.

4.
Mapping: Now, we map the identified mood tags to Russell's (1980) "circumplex model" of valence and arousal and extend it by the third dimension of dominance, as suggested by Scherer (2004). The mapping is achieved by using the wordlist by Warriner, Kuperman, and Brysbaert (2013), which contains crowdsourced values for valence, arousal and dominance (V-A-D) for a total of 13,915 lemmas. Table 1 provides some statistics for the distribution of the V-A-D tags within our dataset.
As songs often have multiple mood tags, we calculate the weighted average for V-A-D (see Figure 2). The weights for each tag are derived from Last.fm, where higher values indicate an increased relevance of a tag.

Figure 1
The sampling process is guided by 279 seed mood labels from AllMusic, which were used to retrieve songs with basic mood labels via the Last.fm API. Since some of these songs do not contain a tag that matches the V-A-D wordlist by Warriner, Kuperman, and Brysbaert (2013), a total of 90,408 songs remain after the mapping step.

5.
Metadata: As a final step, we add information that allows researchers to extend the dataset with further metadata. From the Last.fm API, we collect the MusicBrainz ID (mbid; https://musicbrainz.org/doc/Developer_Resources) for each song for which it is available (this is the case for about two-thirds of the dataset). The mbid also allows us to remove duplicates, which occur when the artist name or song title is spelled differently. After removing 407 duplicates from 61,624 songs with a mbid, we are left with a total of 90,001 songs in our dataset. There may be a few more duplicates in the dataset among the songs that do not come with a mbid. To showcase a potential enhancement of the dataset, we also collected the Spotify ID for a total of 61,630 songs, which enables researchers to add further metadata from the Spotify API (https://developer.spotify.com/ documentation/web-api). More details on how we added the Spotify ID are described in the reuse section of this paper.

(3) DATASET DESCRIPTION
The dataset contains the following types of information for each song object (see Table 2) and is available for download in CSV format. 1 1 Please note that the current dataset is in version 3, as it contains slightly different metadata than originally described in Akiki and Burghardt (2020). Further adjustments were made following the peer review process of this paper.  A note on genre: The user-generated Last.fm tags not only contain emotion tags, but also a vast amount of genre tags. We extract these genre tags by filtering the weighted list of tags against a hardcoded list of musical genres, which essentially leaves us with a list of weighted genres describing each song. 2 We then assume the genre label with the highest weight to be the most likely representative of a given song's genre and include that in the dataset. Using this method, we were able to label 76,321 songs with genre information.

VALENCE TAGS AROUSAL TAGS DOMINANCE TAGS
A note on missing release years: Information about the release year of a song is unfortunately not available via the Last.fm API.
OBJECT NAME

(4) REUSE POTENTIAL
With our current MuSe dataset, we provide a resource that enables different kinds of research questions that take into account the relationship between the V-A-D dimensions and other metadata, such as artist, title, and genre. As the mood tags themselves cannot be included in the dataset for copyright reasons, we provide a Jupyter notebook via the Kaggle repository that demonstrates how to fetch the tags of a given song from the Last.fm API (https://www.kaggle.

com/cakiki/muse-dataset-using-the-last-fm-api).
To illustrate the reuse potential of the data set in terms of extensibility, we also provide the Spotify ID whenever we were able to find one in an unambiguous way for the songs in our collection (see Akiki & Burghardt, 2020). In the end, we were able to track down a Spotify ID for a total of 61,630 songs. Via the Spotify ID, researchers may append any additional information to the dataset that is available via the Spotify API, for instance: • further metadata: release date, popularity, available markets, etc.
• mid-level audio features: acousticness, danceability, tempo, energy, valence, etc. With this additional information, research questions such as the following could be investigated: Is there a correlation between a song's popularity rank and its sentiment? Are there genrespecific effects, i.e., does negative sentiment help the popularity of songs in certain genres (e.g., blues or black metal) but tend to negatively affect the popularity of songs in other genres (e.g., pop and dance)?
In the Kaggle repository of the dataset, we provide another Jupyter notebook (https://www.kaggle. com/cakiki/muse-dataset-using-the-spotify-api) that demonstrates how to enrich the dataset with audio features using various endpoints of the Spotify API (as showcased in Akiki & Burghardt, 2020). Another way to extend the dataset with further metadata is provided by means of the MusicBrainz ID, which we gathered directly from the Last.fm API for about two-thirds of the songs. MusicBrainz provides additional information for each song, for instance, the respective cover art, which allows for future studies on the relationship of musical sentiment and the cover art design. Furthermore, artist and title information may be used to add lyrics information to analyze the relation of lyrics and musical sentiment. In addition to Spotify and MusicBrainz, other sources of metadata, most notably Discogs (https://www.discogs.com/), might be added to the dataset.
We believe that MuSe will be a good starting point for musical sentiment data that can be extended in several directions. All in all, we hope the MuSe dataset will help to advance the field of computational musicology and thus provide an incentive for more quantitative studies on the function and role of emotions in music.