Supervised speech separation combined with adaptive beamforming

Šarić, Zoran; Subotić M.; Bilibajkic R.; Barjaktarovic, Marko; Stojanovic, Jasmina

Please use this identifier to cite or link to this item: https://scidar.kg.ac.rs/handle/123456789/15026

Title:	Supervised speech separation combined with adaptive beamforming
Authors:	Šarić, Zoran Subotić M. Bilibajkic R. Barjaktarovic, Marko Stojanovic, Jasmina
Issue Date:	2022
Abstract:	Microphone arrays are a powerful tool for ambient noise suppression. A multi-channel minimum mean square error (MMSE) solution can be factorized into a minimum variance distortionless response beamformer (MVDR) followed by a single-channel Wiener post-filter. MVDR beamformer, as well as its equivalent form of generalized sidelobe canceller (GSC), often does not provide sufficient noise reduction due to its limited ability to reduce diffuse noise and reverberation. Steering and calibration errors also degrade the performance of both MVDR and GSC beamformers. Post-filter can be realized by any single-channel noise reduction method. A modern and promising approach for single-channel noise reduction is formulated as a supervised speech separation (SSS) in which a supervised learning algorithm, typically a deep neural network (DNN), is trained to learn a mapping from the noisy features to a time-frequency representation of the target of interest. In this paper, we combined SSS and adaptive beamforming approaches. Adaptive beamforming is realized by simplified GSC (S-GSC) whose equivalence with MVDR beamformer is also proved in the paper. In the proposed S-GSC beamformer, the conventional beamformer is replaced by the central microphone signal. Steering towards the target speaker needs no direction of arrival (DOA) estimation. Trained DNN of the SSS module estimates ideal ratio mask (IRM) which is used for adaptation of the blocking matrix, calibration of the microphones, adaptation for the adaptive noise canceller, and the post-filtering. The proposed method was tested on 720 utterances of the TIMIT database used as target speech. The reverberant room was simulated by acoustic impulse responses recorded in the real room. Performance analysis was carried out with PESQ, STOI, and SDR measures. The test results showed that the proposed combined method outperforms the individual SSS and S-GSC methods.
URI:	https://scidar.kg.ac.rs/handle/123456789/15026
Type:	article
DOI:	10.1016/j.csl.2022.101409
ISSN:	0885-2308
SCOPUS:	2-s2.0-85131449965
Appears in Collections:	Faculty of Medical Sciences, Kragujevac

Page views(s)

956

Downloads(s)

31

Files in This Item:

File	Description	Size	Format
PaperMissing.pdf Restricted Access		29.85 kB	Adobe PDF	View/Open

Show full item record

SCIDAR - A Digital Archive of the University of Kragujevac

Page views(s)

Downloads(s)