Reviews on ICML 2019 workshop on Music Discovery

Some of interesting research talks from ICML ML4MD workshop 2019.

NPR : Neural Personalized Ranking for Song Selection

Motivation : Need for a recommendation system that handles possible range of current query as well as user’s personal taste. (relevant item for one might not be relevant to the other)

Solution : A neural network system that encodes a user’s taste jointly with corresponding free-form text (query).

Key references

  • Attentional Factorization Machines (Xiao et al., 2017)
  • Latent Cross operation (Beutel et al., 2018).
  • Attention module : a weighted sum of pairwise interactions between embeddings of the query text and of each item in the user’s history

Dataset cleaning

Paper describes the playlist and title data they used; they have done some quality constraints such as..

  • avoid album dumps
  • limit the number of playlists per user.
  • reasonable number of tracks (?)
  • limit playlists per exact title (?)
  • only retain words and tracks that occur in at least 15 playlists / artists that occur in at least 10 user histories.

Each training example consists of ..

  • the title of a playlist
  • 10 positive tracks sampled
  • 50 randomly sampled negative tracks
  • up to 200 of the user’s most frequently listened artists.

Quantitative evaluation

  • performed on test split of 5000 randomly chosen playlists.
  • used ranking metric (MRR / mAP / typical rank)

My thoughts

  • Got especially interested as I’ve been working in search of a proper playlist embedding for title generation.
  • Neat approach to combine two different information as kinda multi-modal approach.
  • Artist list was the only data they used to represent user taste. Would there be other possibilities?
  • In previous experiments, I’ve mostly used Siamese network with pairwise ranking loss for negative sampling approach. Would binary CE give a better result?
  • Actually still not convinced with the quality of playlist titles filtered by the described method, however, it’ll be worth trying..!

Personalization at Amazon Music

Presented 3 issues on personalized music recommendation

  • Generating a dataset of result pairs for given queries. : and then train a pairwise ranking model with it!
  • A contextual multi-arm bandit model. : chooses a strategy based on contextual signals about the customer and request.
  • Embedding model.
    1. Collaborative filtering
    2. Use content-based embeddings (again, Siamese network with pairwise ranking loss) as building blocks for different use cases

My thoughts

  • There certainly are demands on research to handle queries via voice assistant. What are the most often queries?
  • The value of large scale user feedback data.
  • Reinforcement learning can actually be a very useful approach to handle the interaction with users. – other MIR problems suitable for RL ?
  • Each ‘action’ is a playlist-level (not track-level) recommendation. Handling each playlist as a single unit is another possibility for some tasks. (with discretized playlist prototypes maybe..)
  • CF still seems to dominate.

Making efficient use of musical annotation

The keynote speaker’s research interest has moved from music recommendation to actually looking into the music audio track itself.

To better solve MIR problems with machine learning approach, richly annotated dataset is essential but too expensive.

–> Alternative approach is to use music domain knowledge to set up a proper structure of a machine learning model.

  • exploring invariance / equivariance : e.g. learning a pitch-invariant convolutional kernel
  • exploring within-task structure : designing loss functions
  • exploring cross-task structure : transfer knowledge from other resource

A few examples :

ex1. Chord recognition problem

–> This type of encoding can also be useful for bass tracking or hierarchical segmentation.

ex 2. Multiple F0 estimation

-> Solution ? Use multi f0 as pre-conditioner for other tasks.

–> To overcome small size of dataset for each task, leverage multiple datasets for relevant (but slightly different) tasks to better pre-condition the model for desired task.

ex 3. Instrument recognition

–> Designed (or adapted?) a trainable pooling operator that exactly fits for solving the specific task.

Take away messages

My thoughts

  • Motivated !
  • Inspired !
  • Especially about using datasets for different tasks to transfer better pre-conditioning knowledge. –> how can it improve my current problem?.. hmm..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with
Get started
%d bloggers like this: