PhD defence by Nicolai Fernández Pedersen

On Tuesday 22 February, Nicolai Fernández Pedersen will defend his PhD thesis: "Audiovisual speech analysis with deep learning"

Time: Tuesday 22 February, at 13:00

Place: Building 303, aud. 43 

- Zoom sign up:

Please be aware that the PhD defence may be recorded - This will also be informed at the beginning of the PhD defence. 


Principal supervisor: Senior Researcher Jens Hjortkjær
Co-supervisor: Professor Torsten Dau
Co-supervisor: Professor Lars Kai Hansen


Associate Professor Tobias May, DTU Health Tech
Professor Zheng-Hua Tan, Aalborg University
Professor Hani Camille Yehia, Federal University of Minas Gerais

Chairperson at defence:

Associate Professor Jeremy Marozeau


It is well known that seeing a talker's face can improve the comprehension of auditory speech compared to listening without visual inputs. This is especially observed in noisy settings such as "cocktail-party" scenarios. However, there is a lack of knowledge of the audiovisual (AV) cues and how the two modalities are related. This thesis aimed to contribute to a better understanding of the relationship between auditory and visual cues created during speech production. The AV relationship was analyzed across thousands of speakers. This being possible due to recent advances in computer vision and data-driven approaches. Using canonical correlation analysis we identified two primary temporal ranges of envelope fluctuations related to facial motions across speakers. Using a self-supervised learning approach, we trained interpretable nonlinear neural networks to extract highly correlated AV features. Lastly, we presented an AV speech separation model that used visual cues to perform acoustic source separation. 

Overall, this thesis provided new insights into how auditory and visual speech cues are related and showed their usefulness in AV speech separation.


Tue 22 Feb 22
13:00 - 16:00


DTU Sundhedsteknologi


Building 303, aud. 43