Sound Demo of Cross-Talk Reduction on CHiME-7


This webpage provides a sound demo for the following paper:

Z.-Q. Wang, A. Kumar, and S. Watanabe, "Cross-Talk Reduction", in International Joint Conferences on Artificial Intelligence (IJCAI), 2024.

This demo presents the results of CTRnet on the close-talk mixtures of CHiME-7, which exhibit severe cross-talk.
Note that the provided wavefiles are the segmented short utterances used for ASR decoding. That is:
  • For CTRnet, after we obtain the separated signal (with the same length as the recording session) of each speaker, we use oracle speaker timestamps to cut the long signal to individual short utternaces (which are fed to ASR models for decoding). It corresponds to row 3 of Table 2 in the paper.
  • For GSS, it first splits long mixtures to short utterances, then augments each utterance with abundant left and right context, and finally apply guided spatial clustering and beamforming for separation. It corresponds to row 4 of Table 2 in the paper.


Utterance ID = S21-P48-776376_777082 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P04-819770_820264 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P01-786906_787403 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P02-942311_943384 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S20-P50-382760_383432 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P01-700345_700777 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P01-811883_812404 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S19-P49-488504_489128 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S21-P46-390873_391316 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S19-P51-812763_813279 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P03-158380_159040 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S21-P46-575737_576977 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S21-P47-278934_279956 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S19-P52-772311_772719 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S19-P52-137782_138266 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S21-P46-483436_483959 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P02-868324_868751 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P03-692316_692792 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P04-658210_658742 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P02-792194_792701 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S01-P04-939074_939543 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S21-P47-242559_243544 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet



Utterance ID = S19-P51-083779_084747 Target Speaker Log-compreesed Power Spectrogram
Mixture
GSS
CTRnet