{"id":1036,"date":"2023-11-04T10:36:11","date_gmt":"2023-11-04T09:36:11","guid":{"rendered":"https:\/\/reach.ircam.fr\/?p=1036"},"modified":"2024-03-01T10:41:38","modified_gmt":"2024-03-01T09:41:38","slug":"towards-improving-harmonic-sensitivity-and-prediction-stability-for-singing-melody-extraction","status":"publish","type":"post","link":"https:\/\/reach.ircam.fr\/index.php\/2023\/11\/04\/towards-improving-harmonic-sensitivity-and-prediction-stability-for-singing-melody-extraction\/","title":{"rendered":"Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"1036\" class=\"elementor elementor-1036\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7ddf585 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7ddf585\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0e04ff3\" data-id=\"0e04ff3\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a97d755 elementor-widget elementor-widget-text-editor\" data-id=\"a97d755\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, &amp; Shlomo Dubnov. (2023). Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction. Proceedings of the 24th International Society for Music Information Retrieval Conference, 657\u2011663.<\/p><p><a href=\"https:\/\/zenodo.org\/records\/10265373\">Full publication<\/a><\/p><p><a href=\"https:\/\/zenodo.org\/records\/10265373\/files\/000078.pdf?download=1\">Download publication<\/a><\/p><p><strong>Abstract<\/strong>: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model&rsquo;s sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, &amp; Shlomo Dubnov. (2023). Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction. Proceedings of the 24th International Society for Music Information Retrieval Conference, 657\u2011663. Full publication Download publication Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":1038,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[46],"tags":[],"class_list":["post-1036","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-publications-research"],"aioseo_notices":[],"blog_post_layout_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-150x150.png",150,150,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829.png",1859,874,false]},"categories_names":{"46":{"name":"Publications","link":"https:\/\/reach.ircam.fr\/index.php\/category\/research\/publications-research\/"}},"tags_names":[],"comments_number":"0","wpmagazine_modules_lite_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-150x150.png",150,150,true],"cvmm-medium":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-300x300.png",300,300,true],"cvmm-medium-plus":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-305x207.png",305,207,true],"cvmm-portrait":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-400x600.png",400,600,true],"cvmm-medium-square":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-600x600.png",600,600,true],"cvmm-large":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-1024x874.png",1024,874,true],"cvmm-small":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829-130x95.png",130,95,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/Capture-decran-2024-03-01-103829.png",1859,874,false]},"_links":{"self":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/comments?post=1036"}],"version-history":[{"count":7,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1036\/revisions"}],"predecessor-version":[{"id":1044,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1036\/revisions\/1044"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media\/1038"}],"wp:attachment":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media?parent=1036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/categories?post=1036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/tags?post=1036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}