Supervised speech separation combined with adaptive beamforming

Šarić, Zoran; Subotić M.; Bilibajkic R.; Barjaktarovic, Marko; Stojanovic, Jasmina

Молимо вас користите овај идентификатор за цитирање или овај линк до ове ставке: https://scidar.kg.ac.rs/handle/123456789/15026

Назив:	Supervised speech separation combined with adaptive beamforming
Аутори:	Šarić, Zoran Subotić M. Bilibajkic R. Barjaktarovic, Marko Stojanovic, Jasmina
Датум издавања:	2022
Сажетак:	Microphone arrays are a powerful tool for ambient noise suppression. A multi-channel minimum mean square error (MMSE) solution can be factorized into a minimum variance distortionless response beamformer (MVDR) followed by a single-channel Wiener post-filter. MVDR beamformer, as well as its equivalent form of generalized sidelobe canceller (GSC), often does not provide sufficient noise reduction due to its limited ability to reduce diffuse noise and reverberation. Steering and calibration errors also degrade the performance of both MVDR and GSC beamformers. Post-filter can be realized by any single-channel noise reduction method. A modern and promising approach for single-channel noise reduction is formulated as a supervised speech separation (SSS) in which a supervised learning algorithm, typically a deep neural network (DNN), is trained to learn a mapping from the noisy features to a time-frequency representation of the target of interest. In this paper, we combined SSS and adaptive beamforming approaches. Adaptive beamforming is realized by simplified GSC (S-GSC) whose equivalence with MVDR beamformer is also proved in the paper. In the proposed S-GSC beamformer, the conventional beamformer is replaced by the central microphone signal. Steering towards the target speaker needs no direction of arrival (DOA) estimation. Trained DNN of the SSS module estimates ideal ratio mask (IRM) which is used for adaptation of the blocking matrix, calibration of the microphones, adaptation for the adaptive noise canceller, and the post-filtering. The proposed method was tested on 720 utterances of the TIMIT database used as target speech. The reverberant room was simulated by acoustic impulse responses recorded in the real room. Performance analysis was carried out with PESQ, STOI, and SDR measures. The test results showed that the proposed combined method outperforms the individual SSS and S-GSC methods.
URI:	https://scidar.kg.ac.rs/handle/123456789/15026
Тип:	article
DOI:	10.1016/j.csl.2022.101409
ISSN:	0885-2308
SCOPUS:	2-s2.0-85131449965
Налази се у колекцијама:	Faculty of Medical Sciences, Kragujevac

Број прегледа

868

Број преузимања

20

Датотеке у овој ставци:

Датотека	Опис	Величина	Формат
PaperMissing.pdf Ограничен приступ		29.85 kB	Adobe PDF	Погледајте

Приказати целокупан запис ставки

Ставке на SCIDAR-у су заштићене ауторским правима, са свим правима задржаним, осим ако није другачије назначено.

SCIDAR - Дигитална архива Универзитета у Крагујевцу

Број прегледа

Број преузимања