GAN-Based Multi-Microphone Spatial Target Speaker Extraction

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

International Audio Laboratories Erlangen, Am Wolfsmantel 33, 91058 Erlangen, Germany

{shrishti.saha.shetu, emanuel.habets, andreas.brendel}@iis.fraunhofer.de

Abstract

Spatial target speaker extraction isolates a desired speaker’s voice in multi-speaker environments using spatial cues, such as the DoA. Although recent DNN-based discriminative methods have shown significant performance improvements, the potential of generative approaches, such as generative adversarial networks (GANs), remains largely unexplored for this problem. In this work, we demonstrate that a GAN can effectively leverage both noisy mixtures and spatial information to extract and generate the target speaker’s speech. By conditioning the GAN on intermediate features of a discriminative spatial filtering model in addition to DoA, we enable steerable target extraction with high spatial resolution of 5 degree, outperforming state-of-the-art discriminative methods.

Evaluation Scenarios

In our work, we evaluate our proposed methods in two separate evaluation scenario against the baselines methods.

Following you can find some processed samples with different Methods:

1. Fixed-target scenario --> Go to the samples

2. Steerable-target scenario --> Go to the samples

1. Fixed-target scenario

Item 1 (Speaker: Male)

Item 2 (Speaker: Male)

Item 3 (Speaker: FeMale)

Item 4 (Speaker: FeMale)

Item 5 (Speaker: FeMale)

Item 6 (Speaker: Male)

Item 7 (Speaker: FeMale)

Item 8 (Speaker: Male)

Item 9 (Speaker: Male)

Item 10 (Speaker: FeMale)

2. Steerable-target scenarioo

Item 1 (Target Angel: -110)

Item 2 (Target Angel: 65)

Item 3 (Target Angel: 170)

Item 4 (Target Angel: -125)

Item 5 (Target Angel: -170)

Item 6 (Target Angel: 130)

Item 7 (Target Angel: -70)

Item 8 (Target Angel: -175)

Item 9 (Target Angel: -65)

Item 10 (Target Angel: -150)

Conditions of Use

1.Fraunhofer IIS generated this sound material based on material that is publicly available on VCTK dataset, DNS Challenge and ESC-50 .

2.The content has been processed using generally accepted rules of technology as well as scientific care, but not actual attainment of any expected feature.

3. With the exception of willful intent or gross negligence, Fraunhofer IIS shall not be liable that Open Source software or other third-party software is free from any error or claim or its fitness for a particular purpose, even if included within the Sound Material.

4.The Sound Material shall only be used for testing and appreciating noise reduction techniques and shall not be copied, publicly transmitted, distributed, lent or modified for any other reason.

5.No representation or warranties are made or implied regarding the accuracy, non-infringement, or fitness for a particular purpose of Sound Material.

6.Copyright and Permission notice shall be duplicated whenever Sound Material is copied, distributed, or publicly transmitted.

6.The Sound material cannot be distributed with charge. --> Go to Top