This webpage provides a sound demo based on the test set of the SMS-WSJ-FF-CT corpus. For technical details, please refer to the following paper:
Z.-Q. Wang, "Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation", in arXiv preprint arXiv:2402.09313, 2024.
Utterance ID: 1046_440c0414_444c040n | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
0.7 | -0.6 | ||||
M2M (row 1a) | ![]() |
![]() |
15.3 | 15.0 | |||
UNSSOR (row 4a) | ![]() |
![]() |
11.6 | 11.2 | |||
PIT (row 4b) | ![]() |
![]() |
16.9 | 17.0 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
3.8 (ref: close-talk speech) | 6.9 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 1257_441c0412_442c040j | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
2.0 | -2.0 | ||||
M2M (row 1a) | ![]() |
![]() |
20.0 | 18.5 | |||
UNSSOR (row 4a) | ![]() |
![]() |
18.3 | 17.0 | |||
PIT (row 4b) | ![]() |
![]() |
20.6 | 19.1 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
15.0 (ref: close-talk speech) | 7.8 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 279_445c040l_444c0408 | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
0.1 | 0.0 | ||||
M2M (row 1a) | ![]() |
![]() |
23.3 | 23.1 | |||
UNSSOR (row 4a) | ![]() |
![]() |
19.7 | 20.1 | |||
PIT (row 4b) | ![]() |
![]() |
26.1 | 25.3 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
18.1 (ref: close-talk speech) | 19.6 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 442_440c040i_442c0405 | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
-1.5 | 1.4 | ||||
M2M (row 1a) | ![]() |
![]() |
19.6 | 20.8 | |||
UNSSOR (row 4a) | ![]() |
![]() |
15.9 | 18.3 | |||
PIT (row 4b) | ![]() |
![]() |
21.4 | 22.1 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
18.9 (ref: close-talk speech) | 17.4 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 315_447c0404_442c040o | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
-0.4 | 0.3 | ||||
M2M (row 1a) | ![]() |
![]() |
23.7 | 22.6 | |||
UNSSOR (row 4a) | ![]() |
![]() |
20.5 | 20.0 | |||
PIT (row 4b) | ![]() |
![]() |
26.3 | 24.4 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
17.7 (ref: close-talk speech) | 24.3 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 158_440c040e_442c040k | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
0.5 | -0.5 | ||||
M2M (row 1a) | ![]() |
![]() |
20.1 | 19.6 | |||
UNSSOR (row 4a) | ![]() |
![]() |
18.1 | 17.4 | |||
PIT (row 4b) | ![]() |
![]() |
21.5 | 21.1 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
19.9 (ref: close-talk speech) | 24.1 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 39_443c040s_444c040o | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
0.2 | -0.0 | ||||
M2M (row 1a) | ![]() |
![]() |
18.0 | 18.0 | |||
UNSSOR (row 4a) | ![]() |
![]() |
16.6 | 16.6 | |||
PIT (row 4b) | ![]() |
![]() |
18.8 | 19.1 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
19.9 (ref: close-talk speech) | 13.1 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 281_447c040c_445c0402 | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
-0.5 | 0.4 | ||||
M2M (row 1a) | ![]() |
![]() |
13.6 | 14.2 | |||
UNSSOR (row 4a) | ![]() |
![]() |
10.2 | 9.5 | |||
PIT (row 4b) | ![]() |
![]() |
15.4 | 15.9 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
7.1 (ref: close-talk speech) | 8.0 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 176_440c0416_442c040l | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
-1.0 | 0.9 | ||||
M2M (row 1a) | ![]() |
![]() |
11.1 | 12.3 | |||
UNSSOR (row 4a) | ![]() |
![]() |
6.0 | 8.5 | |||
PIT (row 4b) | ![]() |
![]() |
11.7 | 13.1 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
2.8 (ref: close-talk speech) | 8.6 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - | |||
Utterance ID: 948_444c0403_446c040r | Spk 1 | Log Mag. of Spk 1 | Spk 2 | Log Mag. of Spk 2 | SDR (dB) of Spk 1 | SDR (dB) of Spk 2 | |
Far-Field Mixture (row 0) | ![]() |
-1.4 | 1.4 | ||||
M2M (row 1a) | ![]() |
![]() |
18.7 | 21.6 | |||
UNSSOR (row 4a) | ![]() |
![]() |
16.6 | 18.6 | |||
PIT (row 4b) | ![]() |
![]() |
21.2 | 23.5 | |||
Far-Field Clean Speaker Image | ![]() |
![]() |
- | - | |||
Close-Talk Mixture | ![]() |
![]() |
18.4 (ref: close-talk speech) | 25.0 (ref: close-talk speech) | |||
Close-Talk Clean Speaker Image | ![]() |
![]() |
- | - |