COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

Abstract

We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), acontrastive learning method for musical audio representations that captures theharmonic and rhythmic coherence between samples. Our method operates at thelevel of stems (or their combinations) composing music tracks and allows theobjective evaluation of compositional models for music in the task ofaccompaniment generation. We also introduce a new baseline for compositionalmusic generation called CompoNet, based on ControlNet, generalizing the tasksof MSDM, and quantify it against the latter using COCOLA. We release all modelstrained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB,Slakh2100, and CocoChorales).

Quick Read (beta)

loading the full paper ...