# setk

**Repository Path**: shirlim/setk

## Basic Information

- **Project Name**: setk
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-08-17
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## SETK: Speech Enhancement Tools integrated with Kaldi

Here are some speech enhancement/separation tools integrated with [Kaldi](https://github.com/kaldi-asr/kaldi). I use them for front-end's data processing.

### Python Scripts

* Supervised (mask-based) adaptive beamformer (GEVD/MVDR/MCWF...)
* Data convertion among MATLAB, Numpy and Kaldi
* Data visualization (TF-mask, spatial/spectral features, beam pattern...)
* Unified data and IO handlers for Kaldi's scripts, archives, wave and numpy's ndarray...
* Unsupervised mask estimation (CGMM/CACGMM)
* Spatial/Spectral feature computation
* DS (delay and sum) beamformer, SD (supper-directive) beamformer
* AuxIVA, WPE & WPD, FB (Fixed Beamformer)
* Mask computation (iam, irm, ibm, psm, crm)
* RIR simulation (1D/2D arrays)
* Single channel speech separation (TF spectral masking)
* Si-SDR/SDR/WER evaluation
* Pywebrtc vad wrapper
* Mask-based source localization
* Noise suppression
* Data simulation
* ...

Please check out the following instruction for usage of the scripts.

* [Adaptive Beamformer](doc/adaptive_beamformer)
* [Fixed Beamformer](doc/fixed_beamformer)
* [Sound Source Localization](doc/ssl)
* [Spectral Feature](doc/spectral_feature)
* [Spatial Feature](doc/spatial_feature)
* [VAD](doc/vad)
* [Noise Suppression](doc/ns)
* [Steer Vector](doc/steer_vector)
* [Room Impluse Response](doc/rir)
* [Spatial Clustering](doc/spatial_clustering)
* [WPE & WPD](doc/wpe)
* [Time-frequency Mask](doc/tf_mask)
* [Format Transform](doc/format_transform)
* [Data Simulation](doc/data_simu)

### Kaldi Commands

* Compute time-frequency masks (ibm, irm etc)
* Compute phase & magnitude spectrogram & complex STFT
* Seperate target component using input masks
* Wave reconstruction from enhanced spectral features
* Complex matrix/vector class
* MVDR/GEVD beamformer (depend on T-F mask, not very stable)
* Fixed beamformer
* Compute angular spectrogram based on SRP-PHAT
* RIR generator (reference from [RIR-Generator](https://github.com/ehabets/RIR-Generator))

To build the sources, you need to compile [Kaldi](https://github.com/kaldi-asr/kaldi) with `--shared` flags and patch `matrix/matrix-common.h` first
```c++
typedef enum {
    kTrans          = 112,  // CblasTrans
    kNoTrans        = 111,  // CblasNoTrans
    kConjTrans      = 113,  // CblasConjTrans
    kConjNoTrans    = 114   // CblasConjNoTrans
} MatrixTransposeType;
```

Then run
```bash
mkdir build
cd build
export KALDI_ROOT=/path/to/kaldi/root
export OPENFST_ROOT=/path/to/openfst/root
# if on UNIX, need compile kaldi with openblas
export OPENBLAS_ROOT=/path/to/openblas/root
cmake ..
make -j
```

***Now I mainly work on [sptk](scripts) package, development based on kaldi is stopped.***