MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

3 août 20231 mars 2024Gerard AssayagComment(0)

Ke Chen, K., Wu, Y., Liu, H., Nezhurina, M., Berg-Kirkpatrick, T., and Dubnov, S., MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies, arXiv preprint,, 2023. doi:10.48550/arXiv.2308.01546.

Full publication

Download publication

Abstract: Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.

Gerard Assayag

Conferences Publications

Multitrack Music Transformer

5 mai 20239 mars 2024Gerard Assayag

HAo-Wen Dong, K. Chen, S. Dubnov, J. McAuley and T. Berg-Kirkpatrick, « Multitrack Music Transformer, » ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094628. Full publication Abstract: Existing approaches for generating multitrack music with transformer models have been limited in terms of the […]

Publications

A New Dataset for Tag- and Text-based Controllable Symbolic Music Generation

7 novembre 20249 décembre 2024reachedit

By Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov,Hao-Wen Dong ISMIR Late-Breaking Demos, Nov 2024, San Francisco, United States Read full publication. Abstract: Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, similar attempts for symbolic-domain controllable music generation has been hindered due to […]

Publications

Discovering Repeated Patterns From the Onsets in a Multidimensional Representation of Music

7 février 20244 mars 2024Gerard Assayag

Paul Lascabettes & Isabelle Bloch. Discovering Repeated Patterns from Onsets, Third International Conference on Discrete Geometry and Mathematical Morphology, (DGMM), Firenze, Italy, accepted. Full publication Download publication Abstract: This article deals with the discovery of repeated patterns in a multidimensional representation of music using the theory of mathematical morphology. The main idea proposed here is […]

Navigation de l’article

Artisticiel (extract) on Soundcloud

Improtech @Uzeste in partnership with Hestejada

Laisser un commentaire Annuler la réponse

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Commentaire *

Nom *

E-mail *

Site web

Enregistrer mon nom, mon e-mail et mon site dans le navigateur pour mon prochain commentaire.

Related Articles

Multitrack Music Transformer

A New Dataset for Tag- and Text-based Controllable Symbolic Music Generation

Discovering Repeated Patterns From the Onsets in a Multidimensional Representation of Music

Laisser un commentaire Annuler la réponse