By Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov,
Hao-Wen Dong
ISMIR Late-Breaking Demos, Nov 2024, San Francisco, United States
Abstract: Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, similar attempts for symbolic-domain controllable music generation has been hindered due to the lack of a large-scale symbolic music dataset with extensive metadata and captions. In this paper, we introduce MetaScore, a novel dataset of 963K musical scores, along with extensive metadata collected from an online music forum. Additionally, we provide machine-generated captions for each score. With MetaScore, we explore controllable symbolic music generation and showcase the potential of our proposed dataset in enabling generating symbolic music using free-form natural language.