Reconstruction of high-dimensional audio features with generative neural networks

RUB » Institute of Communication Acoustics » Teaching » Student Theses

Reconstruction of High-Dimensional Audio Features with Generative Neural Networks

Master Thesis

Content
Despite the rapid development of modern computers, there exist crucial applications in embedded devices (EDs) with strict limitations on memory and power. Performing complex tasks, such as acoustic scene classification (ASC) on those EDs, is usually only possible with low-dimensional audio features. To extract reliable information from such limited features (e.g., after connection to a cloud-based server), the goal of this work is to reconstruct log-mel spectrograms from the low-dimensional feature vectors using generative deep neural networks (DNNs), like diffusion models.

Task Description
Start with implementing and evaluating existing baseline DNN architectures. Afterwards, build upon the baselines and develop a set of adaptations (e.g., loss function, architecture) aiming to improve the model performance. Finally, evaluate the generative capability of the model using low-dimensional input features and test the performance of the generated spectrograms in an ASC task.

Requirements

Strong Python or comparable programming skills
Interested in acoustic signal processing and DNNs
Experience in training DNNs is helpful
Can be written in English/German

Contact

Timm Koppelmann, M.Sc.

Room: ID 2/221
Phone: +49 234 32 - 18600
E-Mail

Prof. Dr.-Ing. Rainer Martin

Room: ID 2/233
E-Mail