Pyannote rs
pyannote audio diarization in rust
- Compute 1 hour of audio in less than a minute on CPU. - Faster performance with DirectML on Windows and CoreML on macOS. - Accurate timestamps with Pyannote segmentation. - Identify speakers with wespeaker embeddings. The project is written primarily in Rust, distributed under the MIT License license, first published in 2024. Key topics include: asr, diarization, onnxruntime, rust, speech-recognition.
pyannote-rs
Pyannote audio diarization in Rust
Features
- Compute 1 hour of audio in less than a minute on CPU.
- Faster performance with DirectML on Windows and CoreML on macOS.
- Accurate timestamps with Pyannote segmentation.
- Identify speakers with wespeaker embeddings.
Install
consolecargo add pyannote-rs
Usage
See Building
Examples
See examples
<details> <summary>How it works</summary>pyannote-rs uses 2 models for speaker diarization:
- Segmentation: segmentation-3.0 identifies when speech occurs.
- Speaker Identification: wespeaker-voxceleb-resnet34-LM identifies who is speaking.
Inference is powered by onnxruntime.
- The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).
- The embedding model processes filter banks (audio features) extracted with knf-rs.
Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.
</details>Credits
Big thanks to pyannote-onnx and kaldi-native-fbank
Contributors
Showing top 3 contributors by commit count.
