Allowing for an easy and comfortable usage, many devices nowadays provide some sort of handsfree system or handsfree mode in which the users can interact with the device in some distance, give speech commands, or make telephone calls, without using their hands or even touching the device. Such devices are, for example, mobile phones, communication systems in vehicles, and home assistants. All of them typically include different aspects of digital signal processing for speech enhancement such as beamforming, echo cancellation, or noise reduction. Our group offers modern approaches and research in these topics to ensure high-quality telecommunication independent of the (acoustic) surroundings.
Beamforming describes the procedure to digitally process the signals of an array of multiple microphones to adapt the directivity pattern of the resulting microphone beam. Thereby the array can focus on different speaker positions or even capture the speech of a moving speaker with the oriented beam. Besides the orientation of the microphone array, disturbing ambient noise can be a severe difficulty during telecommunication, impeding speech intelligibility between the calling parties. Originating from traditional signal processing algorithms, we develop state of the art approaches for noise reduction. With the Fully Convolutional Recurrent Neural Network (FCRN) [1] that is able to model temporal context, we – in cooperation with Goodix Technology (Belgium) – scored 2nd rank at Microsoft’s Deep Noise Suppression Challenge on Interspeech 2020 in the non-realtime track with a realtime model.
If a system or device operates in handsfree mode, acoustic coupling between its loudspeaker and microphone can occur: When the so-called far-end speaker’s signal is played via the loudspeaker (or multiple loudspeakers) on the other side, the microphone on that side might capture the respective signal(s) again. This is audible to the far-end speaker as an acoustic echo and can hinder proper communication. To avoid this issue, we perform research on traditional, hybrid, and purely neural network-based learned acoustic echo cancellation approaches [2, 3]. They remove the disturbing acoustic echo component and provide an ideally echo-free microphone signal. A typical final stage during speech enhancement is a so-called postfilter which is often applied to the outgoing microphone signal in the frequency domain [3, 4]. Allowing for a frequency-dependent modification of the signal, a postfilter can be used to perform residual echo suppression, noise reduction, speech reconstruction, or equalizing.
[1] M. Strake, B. Defraene, K. Fluyt, W. Tirry, and T. Fingscheidt, "INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising," in Proc. of INTERSPEECH, Shanghai, China, Oct. 2020,pp. 2467–2471.
[2] J. Franzen, E. Seidel and T. Fingscheidt, "AEC in A Netshell: on Target and Topology Choices for FCRN Acoustic Echo Cancellation," in Proc. of ICASSP, Toronto, Canada, Jun. 2021, pp. 156-160.
[3] E. Seidel, J. Franzen, and T. Fingscheidt, "Y2-Net FCRN for Acoustic Echo and Noise Suppression," in Proc. of INTERSPEECH, Brno, Czechia, Sep. 2021, pp. 1–5.
[4] J. Franzen and T. Fingscheidt, "An Efficient Residual Echo Suppression for Multi-Channel Acoustic Echo Cancellation Based on the Frequency-Domain Adaptive Kalman Filter," in Proc. of ICASSP, Calgary, AB, Canada, Apr. 2018, pp. 226–230.