GitPedia

Pyannote rs

pyannote audio diarization in rust

From thewh1teagle·Updated June 14, 2026·View on GitHub·

- Compute 1 hour of audio in less than a minute on CPU. - Faster performance with DirectML on Windows and CoreML on macOS. - Accurate timestamps with Pyannote segmentation. - Identify speakers with wespeaker embeddings. The project is written primarily in Rust, distributed under the MIT License license, first published in 2024. Key topics include: asr, diarization, onnxruntime, rust, speech-recognition.

Latest release: v0.3.0
December 13, 2024View Changelog →

pyannote-rs

Crates
License

Pyannote audio diarization in Rust

Features

  • Compute 1 hour of audio in less than a minute on CPU.
  • Faster performance with DirectML on Windows and CoreML on macOS.
  • Accurate timestamps with Pyannote segmentation.
  • Identify speakers with wespeaker embeddings.

Install

console
cargo add pyannote-rs

Usage

See Building

Examples

See examples

<details> <summary>How it works</summary>

pyannote-rs uses 2 models for speaker diarization:

  1. Segmentation: segmentation-3.0 identifies when speech occurs.
  2. Speaker Identification: wespeaker-voxceleb-resnet34-LM identifies who is speaking.

Inference is powered by onnxruntime.

  • The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).
  • The embedding model processes filter banks (audio features) extracted with knf-rs.

Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.

</details>

Credits

Big thanks to pyannote-onnx and kaldi-native-fbank

Contributors

Showing top 3 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from thewh1teagle/pyannote-rs via the GitHub API.Last fetched: 6/15/2026