{"id":2259,"date":"2023-08-04T09:17:00","date_gmt":"2023-08-04T07:17:00","guid":{"rendered":"https:\/\/reach.ircam.fr\/?p=2259"},"modified":"2024-10-07T09:19:51","modified_gmt":"2024-10-07T07:19:51","slug":"towards-improving-harmonic-sensitivity-and-prediction-stability-for-singing-melody-extraction-2","status":"publish","type":"post","link":"https:\/\/reach.ircam.fr\/index.php\/2023\/08\/04\/towards-improving-harmonic-sensitivity-and-prediction-stability-for-singing-melody-extraction-2\/","title":{"rendered":"Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction"},"content":{"rendered":"\n<p><a href=\"https:\/\/arxiv.org\/abs\/2308.02723\">Read full publication.<\/a><\/p>\n\n\n\n<p>Published by Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction.<\/p>\n\n\n\n<p><strong>Abstract<\/strong>: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model&rsquo;s sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Read full publication. Published by Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction. Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2260,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[46],"tags":[],"class_list":["post-2259","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-publications-research"],"aioseo_notices":[],"blog_post_layout_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-150x150.png",150,150,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917.png",656,518,false]},"categories_names":{"46":{"name":"Publications","link":"https:\/\/reach.ircam.fr\/index.php\/category\/research\/publications-research\/"}},"tags_names":[],"comments_number":"0","wpmagazine_modules_lite_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-150x150.png",150,150,true],"cvmm-medium":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-300x300.png",300,300,true],"cvmm-medium-plus":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-305x207.png",305,207,true],"cvmm-portrait":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-400x518.png",400,518,true],"cvmm-medium-square":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-600x518.png",600,518,true],"cvmm-large":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917.png",656,518,false],"cvmm-small":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917-130x95.png",130,95,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/10\/Screenshot-2024-10-07-091917.png",656,518,false]},"_links":{"self":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/2259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/comments?post=2259"}],"version-history":[{"count":1,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/2259\/revisions"}],"predecessor-version":[{"id":2261,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/2259\/revisions\/2261"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media\/2260"}],"wp:attachment":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media?parent=2259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/categories?post=2259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/tags?post=2259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}