Sound Demo of USDnet on WSJ0CAM-DEREVERB Corpus


This webpage provides a sound demo on the test set of the WSJ0CAM-DEREVERB corpus. For technical details, please refer to the following paper:

Z.-Q. Wang, "USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering", in submission, 2024.

This demo presents the dereverberation results of
1. 10 randomly-selected test signals in 1-channel input cases (click to jump to the section)
2. 10 randomly-selected test signals in 8-channel input cases (click to jump to the section)
Firefox is recommended.

Monural Input

Utterance ID=000237, T60=0.7533 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.38 0.429 -5.6
USDnet (row 3a of Table VIII) 2.38 0.792 1.8
WPE (row 4a of Table VIII) 1.56 0.548 -4.2
Clean Anechoic - - -



Utterance ID=000686, T60=1.0576 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.41 0.280 -10.6
USDnet (row 3a of Table VIII) 2.26 0.689 -2.3
WPE (row 4a of Table VIII) 1.56 0.358 -8.7
Clean Anechoic - - -



Utterance ID=000298, T60=0.6795 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.59 0.458 -5.2
USDnet (row 3a of Table VIII) 2.53 0.835 2.1
WPE (row 4a of Table VIII) 1.82 0.631 -2.1
Clean Anechoic - - -



Utterance ID=001829, T60=0.5240 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 2.06 0.720 2.4
USDnet (row 3a of Table VIII) 3.22 0.918 8.9
WPE (row 4a of Table VIII) 2.31 0.848 6.4
Clean Anechoic - - -



Utterance ID=001457, T60=0.6059 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.96 0.718 1.3
USDnet (row 3a of Table VIII) 2.99 0.909 5.9
WPE (row 4a of Table VIII) 2.37 0.851 4.9
Clean Anechoic - - -



Utterance ID=001058, T60=1.1142 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.53 0.400 -7.1
USDnet (row 3a of Table VIII) 2.27 0.692 0.6
WPE (row 4a of Table VIII) 1.63 0.468 -5.1
Clean Anechoic - - -



Utterance ID=000865, T60=0.4620 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.77 0.624 0.5
USDnet (row 3a of Table VIII) 2.73 0.873 6.6
WPE (row 4a of Table VIII) 1.93 0.733 2.6
Clean Anechoic - - -



Utterance ID=001314, T60=0.4671 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 2.15 0.788 3.5
USDnet (row 3a of Table VIII) 3.67 0.935 8.4
WPE (row 4a of Table VIII) 2.94 0.924 8.1
Clean Anechoic - - -



Utterance ID=001563, T60=0.6401 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.78 0.559 2.5
USDnet (row 3a of Table VIII) 2.09 0.696 6.1
WPE (row 4a of Table VIII) 1.82 0.599 4.5
Clean Anechoic - - -



Utterance ID=000185, T60=0.7235 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.52 0.532 -1.3
USDnet (row 3a of Table VIII) 2.50 0.796 3.7
WPE (row 4a of Table VIII) 1.63 0.635 1.9
Clean Anechoic - - -

Eight-Channel Input

Utterance ID=002474, T60=0.8400 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.41 0.434 -3.7
USDnet (row 3b of Table VIII) 2.13 0.782 2.9
WPE (row 4b of Table VIII) 1.53 0.612 1.1
Clean Anechoic - - -



Utterance ID=002052, T60=0.2988 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 2.13 0.726 0.4
USDnet (row 3b of Table VIII) 3.16 0.871 2.9
WPE (row 4b of Table VIII) 2.83 0.861 4.3
Clean Anechoic - - -



Utterance ID=000361, T60=0.9619 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.31 0.318 -10.2
USDnet (row 3b of Table VIII) 2.02 0.681 -2.5
WPE (row 4b of Table VIII) 1.47 0.462 -4.8
Clean Anechoic - - -



Utterance ID=000087, T60=0.9660 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.56 0.439 -3.5
USDnet (row 3b of Table VIII) 2.37 0.706 3.1
WPE (row 4b of Table VIII) 1.94 0.621 1.4
Clean Anechoic - - -



Utterance ID=000181, T60=1.0325 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.35 0.319 -7.2
USDnet (row 3b of Table VIII) 1.83 0.649 -1.3
WPE (row 4b of Table VIII) 1.47 0.497 -2.0
Clean Anechoic - - -



Utterance ID=000718, T60=0.5799 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.73 0.412 -6.2
USDnet (row 3b of Table VIII) 2.61 0.734 -1.1
WPE (row 4b of Table VIII) 2.19 0.699 -0.8
Clean Anechoic - - -



Utterance ID=002570, T60=0.7649 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.43 0.483 -9.4
USDnet (row 3b of Table VIII) 2.31 0.804 -1.4
WPE (row 4b of Table VIII) 1.68 0.631 -6.3
Clean Anechoic - - -



Utterance ID=000194, T60=0.3476 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.77 0.619 -0.6
USDnet (row 3b of Table VIII) 2.78 0.815 3.4
WPE (row 4b of Table VIII) 2.37 0.819 4.4
Clean Anechoic - - -



Utterance ID=001377, T60=1.1099 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 1.33 0.406 -2.5
USDnet (row 3b of Table VIII) 1.72 0.686 4.7
WPE (row 4b of Table VIII) 1.42 0.566 3.3
Clean Anechoic - - -



Utterance ID=002021, T60=0.2937 Target Speaker Log-compreesed Power Spectrogram PESQ eSTOI SI-SDR (dB)
Mixture 2.33 0.729 1.0
USDnet (row 3b of Table VIII) 3.32 0.864 3.6
WPE (row 4b of Table VIII) 3.20 0.878 4.8
Clean Anechoic - - -