GAN-Based Multi-Microphone Spatial Target Speaker Extraction

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

FhG_IIS
International Audio Laboratories Erlangen, Am Wolfsmantel 33, 91058 Erlangen, Germany

{shrishti.saha.shetu, emanuel.habets, andreas.brendel}@iis.fraunhofer.de

Abstract

Spatial target speaker extraction isolates a desired speaker’s voice in multi-speaker environments using spatial cues, such as the \ac{DoA}. Although recent \ac{DNN}-based discriminative methods have shown significant performance improvements, the potential of generative approaches, such as generative adversarial networks (GANs), remains largely unexplored for this problem. In this work, we demonstrate that a GAN can effectively leverage both noisy mixtures and spatial information to extract and generate the target speaker’s speech. By conditioning the GAN on intermediate features of a discriminative spatial filtering model in addition to \ac{DoA}, we enable steerable target extraction with high spatial resolution of 5$^\circ$, outperforming state-of-the-art discriminative methods.

Evaluation Scenarios

In our work, we evaluate our proposed method with different SOTA generative and discriminative deep learning-based noise reduction methods in various SNR scenraios.

Following you can find some processed samples with different Methods:

1. Low SNR Dataset --> Go to the samples

2. Real Recordings --> Will be added soon

2. DNS Challenge No-Reverb --> Will be added soon


1. Low SNR Dataset

Item 1 (Speaker: Female)

>

Item 2 (Speaker: FeMale)

>

Item 3 (Speaker: Female)

>

Item 4 (Speaker: FeMale)

>

Item 5 (Speaker: Male)

>

Item 6 (Speaker: Male)

>

Item 7 (Speaker: Male)

>

Item 8 (Speaker: Male)

>

Item 9 (Speaker: Male)

>

Item 10 (Speaker: Male)

>



Conditions of Use

1.Fraunhofer IIS generated this sound material based on material that is publicly available on VCTK dataset, DNS Challenge and ESC-50 .

2.The content has been processed using generally accepted rules of technology as well as scientific care, but not actual attainment of any expected feature.

3. With the exception of willful intent or gross negligence, Fraunhofer IIS shall not be liable that Open Source software or other third-party software is free from any error or claim or its fitness for a particular purpose, even if included within the Sound Material.

4.The Sound Material shall only be used for testing and appreciating noise reduction techniques and shall not be copied, publicly transmitted, distributed, lent or modified for any other reason.

5.No representation or warranties are made or implied regarding the accuracy, non-infringement, or fitness for a particular purpose of Sound Material.

6.Copyright and Permission notice shall be duplicated whenever Sound Material is copied, distributed, or publicly transmitted.

6.The Sound material cannot be distributed with charge. --> Go to Top