Wed 6th December 2023 12:30 – 13:30. This seminar will be ONLINE ONLY at this Teams link.
Abstract
Difficulties in following speech on TV due to loud background sounds are a common issue in broadcasting. Object-based audio (OBA) systems like MPEG-H Audio can address this problem by providing a personalized speech level. Recently, international broadcasters have employed dialogue enhancement (DE) together with OBA, providing customization and improved accessibility to their audiences, e.g., during the football World Cup 2022. To add customizable dialogues to material produced without OBA, deep neural networks (DNNs) can be applied to separate dialogues from the music and effects of the final audio mix. One of the technologies used for this is MPEG-H Dialog+, which has recently been adopted for the new “Clear Speech” service of the on-demand platform of the German public broadcaster ARD. (This talk reviews the current state of DE, detailing real-world adoptions, with particular focus on the MPEG-H Audio system. This is followed by a brief overview of DNN-based dialogue separation that enables DE for a wide range of broadcast material.

Daniela Rieger specializes in audiovisual media and immersive and object-based audio and has been working at Fraunhofer IIS since 2020. As a sound engineer and research associate, she works on topics such as MPEG-H Audio and immersive and object-based audio production. She continues to be involved in research on AI-based Dialogue Enhancement technologies for accessible audio content, being responsible for the Deep Neural Network training data. In 2020 she received the M.Eng. degree from the Stu\gart Media University with a master’s thesis on object-based music production, which won 2nd prize at the ARD/ZDF promotion prize “Women + Media Technologies” in 2021. Since 2022, Daniela Rieger has been on the board of the Verband Deutscher Tonmeister e.V. (VDT), the German association of sound engineers, and in 2023 she was elected as Vice-President of VDT.
Mhd Modar Halimeh received (with distinction) the B.Sc. degree in communications and electronic engineering in 2013 and the M.Sc. degree in communications and multimedia engineering in 2016. Since 2017, he has been working towards the Dr.-Ing. degree at the Friedrich-Alexander-Universität Erlangen-Nürnberg. He is currently with the conversational AI research department at Fraunhofer IIS. His research interests include nonlinear system identification, acoustic echo control, Bayesian filtering, and speech enhancement.