{"id":1278,"date":"2023-08-23T10:53:16","date_gmt":"2023-08-23T08:53:16","guid":{"rendered":"https:\/\/reach.ircam.fr\/?p=1278"},"modified":"2024-03-11T10:55:32","modified_gmt":"2024-03-11T09:55:32","slug":"spatial-upsampling-of-sparse-head-related-transfer-functions-a-vq-vae-transformer-based-approach","status":"publish","type":"post","link":"https:\/\/reach.ircam.fr\/index.php\/2023\/08\/23\/spatial-upsampling-of-sparse-head-related-transfer-functions-a-vq-vae-transformer-based-approach\/","title":{"rendered":"Spatial Upsampling of Sparse Head Related Transfer Functions &#8211; A VQ-VAE &#038; Transformer based approach"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"1278\" class=\"elementor elementor-1278\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7ca8c5f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7ca8c5f\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-dbb9fd2\" data-id=\"dbb9fd2\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3c62d29 elementor-widget elementor-widget-text-editor\" data-id=\"3c62d29\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Devansh Zurale, Shlomo Dubnov, Spatial Upsampling of Sparse Head Related Transfer Functions &#8211; A VQ-VAE &amp; Transformer based Approach, Audio Engineering Society: AES 2023 International Conference on Spatial and Immersive Audio, U. of Huddersfield, U.K., 2023.<\/p><p><a href=\"https:\/\/www.aes.org\/e-lib\/browse.cfm?elib=22202\">Read full publication<\/a><\/p><p><strong>Abstract:<\/strong>\u00a0 With the increasing demand for AR\/VR technologies, enabling accurate reproduction of binaural spatial audio through obtaining individualized Head Related Transfer Functions (HRTFs) has become a high priority subject of research. Meanwhile, recent developments in Generative AI have been providing substantial success in several domains involving audio, language, images etc. In this work we propose a framework to use a 3D Convolutional Neural Network (CNN) based Vector-Quantized Variational AutoEncoder (VQ-VAE) to first learn a regularized latent representation from the HRTFs, which leverages both spatial and spectral correlations between neighboring magnitude HRTFs. We further use the Transformer architecture to find mappings between latent sequences derived from spatially-sparse HRTF measurements and the latent sequences defining the HRTFs having a high spatial resolution. We thereby predict HRTFs at 1440 locations given sparse HRTF measurements from 25 locations, also allowing for freedom over the sampling locations of the sparse HRTFs. We achieve a mean Log Spectral Distortion (LSD) error of 4.5 dB while also demonstrating a contrived but informative case of obtaining a mean LSD of 3 dB when evaluated over 10 validation subjects.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Devansh Zurale, Shlomo Dubnov, Spatial Upsampling of Sparse Head Related Transfer Functions &#8211; A VQ-VAE &amp; Transformer based Approach, Audio Engineering Society: AES 2023 International Conference on Spatial and Immersive Audio, U. of Huddersfield, U.K., 2023. Read full publication Abstract:\u00a0 With the increasing demand for AR\/VR technologies, enabling accurate reproduction of binaural spatial audio through [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":1280,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[52,46],"tags":[],"class_list":["post-1278","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-conferences","category-publications-research"],"aioseo_notices":[],"blog_post_layout_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-150x150.png",150,150,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full.png",699,1010,false]},"categories_names":{"52":{"name":"Conferences","link":"https:\/\/reach.ircam.fr\/index.php\/category\/research\/conferences\/"},"46":{"name":"Publications","link":"https:\/\/reach.ircam.fr\/index.php\/category\/research\/publications-research\/"}},"tags_names":[],"comments_number":"0","wpmagazine_modules_lite_featured_media_urls":{"thumbnail":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-150x150.png",150,150,true],"cvmm-medium":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-300x300.png",300,300,true],"cvmm-medium-plus":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-305x207.png",305,207,true],"cvmm-portrait":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-400x600.png",400,600,true],"cvmm-medium-square":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-600x600.png",600,600,true],"cvmm-large":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full.png",699,1010,false],"cvmm-small":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full-130x95.png",130,95,true],"full":["https:\/\/reach.ircam.fr\/wp-content\/uploads\/2024\/03\/22202_full.png",699,1010,false]},"_links":{"self":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1278","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/comments?post=1278"}],"version-history":[{"count":4,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1278\/revisions"}],"predecessor-version":[{"id":1283,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/posts\/1278\/revisions\/1283"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media\/1280"}],"wp:attachment":[{"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/media?parent=1278"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/categories?post=1278"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/reach.ircam.fr\/index.php\/wp-json\/wp\/v2\/tags?post=1278"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}