Devansh Zurale, Shlomo Dubnov, Learning Sub-Dimensional HRTF Representations Towards Individualization Applications-Traditional and Deep Learning Approaches, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Mohonk Mountain House, New Paltz, NY, USA, 2023
Abstract: Individualized Head Related Transfer Functions (HRTFs) are indispensable in order to accurately reproduce spatial audio over headphones. Encoding the high-dimensional HRTFs to a sub-dimensional space has proven to be useful in many previous research efforts in predicting individualized HRTFs. In this work, we provide a comparative study of some traditional methods such as Principle Component Analysis (PCA) or Multi-Layer Perceptron (MLP) based Autoencoders and the more recent generative deep learning approaches such as a Convolutional Neural Network (CNN) based Vector Quantized Variational Autoencoder (VQ-VAE) for learning HRTF representations. We further demonstrate the benefits of using 3D-CNNs for this task to learn correlations between neighboring HRTFs, along both spatial and frequency dimensions. To this end, we provide evidence suggesting that such a 3D-CNN based approach enables the derived latent space to encode features more representative of the individuality of the HRTFs while also allowing for the representations to be significantly more compact. Finally, we also explore the advantages of such robust representations towards downstream applications of predicting Individualized HRTFs.