Spatialized WSJ0-2mix¶
The recipes/sp_wsj02mix directory provides an end-to-end pipeline for the
Spatialized WSJ0-2mix speech separation benchmark. The recipe automates data
generation, preprocessing, model training, and evaluation on noisy,
reverberant two-speaker mixtures rendered with spatial room impulse responses.
The Makefile is organized into GNU Make stages (stage0–stage5). Each
stage drops .done files so you can resume from intermediate results. Run a
stage with make stageN or execute the complete pipeline via make all.
Important variables such as data, duration, and train_path can be
overridden on the command line. For example:
make stage2 duration=96000 train_path=models/nfca/unet
Configurable Make variables¶
You can inspect the full Makefile on GitHub:
recipes/sp_wsj02mix/Makefile.
Variable |
Default |
Configurable |
Description |
|---|---|---|---|
|
|
✓ |
Selects which preprocessed data stream (e.g., dereverberated vs. reverberant) to pack into HDF5 files. |
|
|
✓ |
Number of audio samples per excerpt when building the HDF5 dataset. |
|
|
✓ |
Directory that stores the model training configuration, checkpoints, and job logs. |
|
|
✓ |
Short identifier used when naming inference runs. |
|
|
✓ |
Inference results directory name; automatically generated but can be overridden for resumption. |
|
|
✓ |
Command executed during Stage 4 for inference; modify to plug in alternative pipelines. |
|
|
✓ |
Output directory for separated signals, logs, and evaluation artifacts. |
|
inherited from |
✓ |
Cluster submission command plus optional job arguments; default invokes |
Stage-by-stage guide¶
Stage 0: dataset preparation¶
make stage0 calls scripts/0_prepare_dataset.py to download/generate the
dry, noisy, and reverberant Spatialized WSJ0-2mix waveforms. Ensure the hdf5/
directory exists before launching this step.
Stage 1: preprocessing¶
make stage1 launches two commands per split (tr, cv, tt):
scripts/1_add_noise.pyadds diffuse noise to the clean mixturesscripts/1_dereverberate.pyremoves the simulated room effect to provide auxiliary dereverberated targets
Both commands are submitted through $(cmd) so they can fan out across job
slots on a cluster.
Stage 2: HDF5 generation¶
make stage2 converts the processed audio into chunked HDF5 datasets by
executing scripts/2_make_hdf5_unsupervised.py with the current data and
duration arguments. Only the tr and cv splits are packaged, as
defined by HDF5_SPLITS.
Stage 3: model training¶
make stage3 triggers aiaccel.torch.apps.train with the Lightning/Aiaccel
configuration stored under $(train_path)/config.yaml. Before training kicks
off, existing checkpoints and logs are removed after an interactive
confirmation. The step finishes when $(train_path)/.train.done is created.
Stage 4: inference¶
make stage4 separates the cv and tt sets by running
python -m sbss.nfca.pipelines.separate batch with the trained checkpoint
specified by train_path. Outputs (and job logs) are stored in
results/<inference_name>. Each split produces a stamp file named
.4_inference.<split>.done under the corresponding results directory.
Stage 5: evaluation¶
make stage5 evaluates SDR, STOI, and PESQ via the scripts
scripts/5_evaluate_sdr.py, scripts/5_evaluate_stoi.py, and
scripts/5_evaluate_pesq.py. After all metric jobs succeed, the scores
target runs scripts/summarize_scores.py to aggregate the measurements for
the current inference_name.
Configuration file¶
The dataset configuration is stored in
recipes/sp_wsj02mix/config.yaml,
which inherits recipes/globals.yaml. Key fields include:
Key |
Default |
Configurable |
Description |
|---|---|---|---|
|
|
Numbers of mixtures for the corresponding splits. |
|
|
|
✓ |
Microphone channels assumed in STFTs and SCMs. |
|
|
✓ |
STFT window size and hop length shared by encoders/decoders. |
|
|
✓ |
Target SNR (dB) for adding white noise when generating mixtures. |
|
|
✓ |
WPE dereverberation hyperparameters used in preprocessing scripts. |
|
list of WSJ0 filenames |
✓ |
Mixtures excluded due to anomalies when rendering Spatialized WSJ0-2mix. |