Munich Neuroscience Calendar

Event:

02.07.2018, 10:00 TUM Institute for Medical Engineering Room Changed !
until 11:30
Event Type: Talk
Speaker: Birger Kollmeier
Institute: Medizinische Physik & Cluster of Excellence Hearing4All - Carl von Ossietzky Universität Oldenburg

Title: Don't believe the elders: Modelling speech recognition with and without hearing aids using machine learning

Location:
Bibliothek, Raum N0116 - see map
Theresienstraße 90 - N1 Nordbau
80333 München

Host: HöReN Research Network
Host Email: miguel.obando@tum.de
Abstract:
Please note the room:
Bibliothek, Raum N0116
Theresienstraße 90 (TUMN1 Nordbau)

Map
http://go.tum.de/415641

Abstract

An overview of some of the work in our Cluster of Excellence Hearing4all (Oldenburg/ Hannover) is given with a focus on recognizing speech in noise - the classical "Cocktail Party Problem" which becomes more acute with increasing hearing loss and age.
While some years ago we assumed that automatic speech recognizers (ASR) need hearing aids to perform as well as human listeners in challenging acoustic situations, recent advances in ASR backends (such as DNN) and ASR frontends (such as Gabor features) have helped to close the gap between human and machine speech recognition performance. Hence, understanding Human Speech Recognition (HSR) is no longer instrumental in improving ASR. Conversely, ASR may help to better model and understand HSR – especially for hearing-impaired listeners and for predicting the effect of a hearing aid for the individual user.

Here the Framework for Auditory Discrimination Experiments (FADE, Schädler et al., JASA 2016) is employed for predicting patient performance employing the German Matrix sentence test (available for 20 major languages, see Kollmeier et al., Int. J. Audiol. 2015). It is compared with a DNN-based ASR system utilizing an open-set sentence recognition test. FADE can well predict the average individual performance with different (binaural) noise reduction algorithms using a cafeteria noise in comparison to individual empirical data from Völker et al. (2015) with R² of about 0.9. Using a simple approach to include suprathreshold performance deficits in the model (Kollmeier et al., 2017), a high precision of predicting the benefit from a hearing device can be achieved.

The analysis shows that several of the "classical" auditory-model-based assumptions and theories about hearing must be modified: For example, the negative effect of increased auditory filterwidths in hearing-impaired listeners is much less pronounced than currently believed. Also, a very simple combination of a filterbank frontend with some amplitude modulation filter properties can already explain the observed speech recognition thresholds in strongly fluctuating noise much better than current standard models (SII, ESII, STOI, mr-sEPSM, Schubotz et al., 2016, Spille et al., 2018). Hence, there is still much to learn about our ears - and machines definitely help.




References
Kollmeier, B., Warzybok, A., Hochmuth, S., Zokoll, M. A., Uslar, V., Brand, T., & Wagener, K. C. (2015). The multilingual matrix test: Principles, applications, and comparison across languages: A review. International Journal of Audiology, 54(sup2), 3-16.

Kollmeier, B., Schädler, M. R., Warzybok, A., Meyer, B. T., & Brand, T. (2016). Sentence recognition prediction for hearing-impaired listeners in stationary and fluctuation noise with FADE: Empowering the Attenuation and Distortion concept by Plomp with a quantitative processing model. Trends in hearing, 20, 2331216516655795.

Schaedler, M.R., Warzybok, A., Ewert, S. D., and Kollmeier, B. (2016). A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception. The Journal of the Acoustical Society of America, 139(5):2708–2722.

Schubotz, W., Brand, T., Kollmeier, B., & Ewert, S. D. (2016). Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. JASA, 140, 524–540.

Spille, C., Kollmeier, B., Meyer, B.T. (2018). "Predicting Speech Intelligibility with Deep Neural Networks," Computer Speech and Language 48, pp. 51-66. doi:10.1016/j.csl.2017.10.004

Völker, C., Warzybok, A., & Ernst, S. M. (2015). Comparing binaural pre-processing strategies III: Speech intelligibility of normal-hearing and hearing-impaired listeners. Trends in hearing, 19, 2331216515618609.