Gitpedia
m-bain

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

30 Releases
Latest: 5d ago
v3.8.6Latest
BarabazsBarabazs·5d ago·May 25, 2026
GitHub

📋 What's Changed

  • Add Indonesian model to alignment.py by @aziib in https://github.com/m-bain/whisperX/pull/1400
  • fix: handle 'ignore' interpolation method in interpolate_nans (#1368) by @Barabazs in https://github.com/m-bain/whisperX/pull/1422
  • build(deps): bump nltk from 3.9.2 to 3.9.4 by @dependabot[bot] in https://github.com/m-bain/whisperX/pull/1421
  • ci: add zizmor workflow and harden existing workflows by @Barabazs in https://github.com/m-bain/whisperX/pull/1423
  • chore(deps): update exclude-newer settings by @Barabazs in https://github.com/m-bain/whisperX/pull/1424

New Contributors

  • @aziib made their first contribution in https://github.com/m-bain/whisperX/pull/1400
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.5...v3.8.6
v3.8.5
BarabazsBarabazs·2mo ago·April 1, 2026
GitHub

📋 What's Changed

  • fix: pin torchvision and torchcodec for torch 2.8 compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1397
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.4...v3.8.5
v3.8.4
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

📋 What's Changed

  • feat: add progress_callback to transcribe, align, and diarize by @Barabazs in https://github.com/m-bain/whisperX/pull/1371
  • fix: remove dead model_bytes read that leaked file handle by @Barabazs in https://github.com/m-bain/whisperX/pull/1381
  • fix: restore word-level timestamps for unalignable characters by @Barabazs in https://github.com/m-bain/whisperX/pull/1386
  • fix: require faster-whisper>=1.2.0 for use_auth_token support (#1385) by @Barabazs in https://github.com/m-bain/whisperX/pull/1388
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.2...v3.8.4
v3.7.9
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

🐛 Bug Fixes

  • Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.

🧪 Testing

  • Add regression test for #1372 (da072d6)
  • Add pytest dev dependency and CI test workflow (f9a3f8f)
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.8...v3.7.9
v3.6.2
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

🐛 Bug Fixes

  • Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.

🧪 Testing

  • Add regression test for #1372 (da072d6)
  • Add pytest dev dependency and CI test workflow (f9a3f8f)
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.1...v3.6.2
v3.5.2
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

🐛 Bug Fixes

  • Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.

🧪 Testing

  • Add regression test for #1372 (da072d6)
  • Add pytest dev dependency and CI test workflow (f9a3f8f)
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.1...v3.5.2
v3.4.5
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

🐛 Bug Fixes

  • Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.

🧪 Testing

  • Add regression test for #1372 (da072d6)
  • Add pytest dev dependency and CI test workflow (f9a3f8f)
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.4...v3.4.5
v3.3.6
BarabazsBarabazs·2mo ago·March 25, 2026
GitHub

🐛 Bug Fixes

  • Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch reverted PR #986 which removed wildcard support entirely. Fixes #1372.

🧪 Testing

  • Add regression test for #1372 (da072d6)
  • Add pytest dev dependency and CI test workflow (f9a3f8f)
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.5...v3.3.6
v3.8.2
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

📋 What's Changed

  • feat: expose avg_logprob per segment from ctranslate2 beam search by @Barabazs in https://github.com/m-bain/whisperX/pull/1350
  • fix: revert #986 wildcard alignment that broke word-level timestamps (#1220) by @Barabazs in https://github.com/m-bain/whisperX/pull/1367
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.1...v3.8.2
v3.7.8
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

🐛 Bug Fixes

  • Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
  • Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.7...v3.7.8
v3.6.1
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

🐛 Bug Fixes

  • Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
  • Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.0...v3.6.1
v3.5.1
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

🐛 Bug Fixes

  • Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
  • Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.0...v3.5.1
v3.4.4
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

🐛 Bug Fixes

  • Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
  • Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.3...v3.4.4
v3.3.5
BarabazsBarabazs·2mo ago·March 10, 2026
GitHub

🐛 Bug Fixes

  • Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
  • Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.4...v3.3.5
v3.8.1
BarabazsBarabazs·3mo ago·February 14, 2026
GitHub

📋 What's Changed

  • Fix: Respect --model_dir and --model_cache_only during alignment by @MrPrayer in https://github.com/m-bain/whisperX/pull/1285
  • feat: forward --hf_token to WhisperModel for gated/private model support by @Barabazs in https://github.com/m-bain/whisperX/pull/1351

New Contributors

  • @MrPrayer made their first contribution in https://github.com/m-bain/whisperX/pull/1285
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.0...v3.8.1
v3.8.0
BarabazsBarabazs·3mo ago·February 13, 2026
GitHub

📋 What's Changed

  • feat: migrate to pyannote-audio v4 with speaker-diarization-community-1 by @Barabazs in https://github.com/m-bain/whisperX/pull/1349
  • Special thanks to @borgoat for taking the lead.
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.7...v3.8.0
v3.7.7
BarabazsBarabazs·3mo ago·February 13, 2026
GitHub

📋 What's Changed

  • Optimize assign_word_speakers with interval tree for 228x speedup by @Mr-Neutr0n in https://github.com/m-bain/whisperX/pull/1338
  • fix: pass no_repeat_ngram_size and repetition_penalty to CTranslate2 generate() by @RickSanchez93 in https://github.com/m-bain/whisperX/pull/1340
  • chore: update type hints by @1carlito in https://github.com/m-bain/whisperX/pull/1342 and https://github.com/m-bain/whisperX/pull/1343
  • fix: derive SRT/VTT cue times from word-level timestamps by @Barabazs in https://github.com/m-bain/whisperX/pull/1347

New Contributors

  • @Mr-Neutr0n made their first contribution in https://github.com/m-bain/whisperX/pull/1338
  • @RickSanchez93 made their first contribution in https://github.com/m-bain/whisperX/pull/1340
  • @1carlito made their first contribution in https://github.com/m-bain/whisperX/pull/1342
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.6...v3.7.7
v3.7.6
BarabazsBarabazs·4mo ago·January 27, 2026
GitHub

📋 What's Changed

  • chore: drop python 3.9 support by @Barabazs in https://github.com/m-bain/whisperX/pull/1328
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.5...v3.7.6
v3.7.5
BarabazsBarabazs·4mo ago·January 27, 2026
GitHub

📋 What's Changed

  • docs: add cuDNN troubleshooting for common issues by @Barabazs in https://github.com/m-bain/whisperX/pull/1266
  • feat: add hotwords argument to CLI for improved recognition of rare terms by @Barabazs in https://github.com/m-bain/whisperX/pull/1268
  • Fix incorrect type annotations in get_writer function in utils.py by @JulianFP in https://github.com/m-bain/whisperX/pull/1144
  • [1246] feat: added language-aware sentence tokenization by @pplkit in https://github.com/m-bain/whisperX/pull/1269
  • fix: pin huggingface-hub<1.0.0 for pyannote-audio compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1327

New Contributors

  • @JulianFP made their first contribution in https://github.com/m-bain/whisperX/pull/1144
  • @pplkit made their first contribution in https://github.com/m-bain/whisperX/pull/1269
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.4...v3.7.5
v3.7.4
BarabazsBarabazs·7mo ago·October 16, 2025
GitHub

chore: upgrade torch and torchaudio dependencies to 2.8.0 **Full Changelog**: https://github.com/m-bain/whisperX/compare/v3.7.3...v3.7.4

v3.7.3
BarabazsBarabazs·7mo ago·October 16, 2025
GitHub

📋 What's Changed

  • feat: add Swedish alignment model by @Npahlfer in https://github.com/m-bain/whisperX/pull/1110
  • fix: lock down torch and torchaudio versions by @Barabazs in https://github.com/m-bain/whisperX/pull/1265

New Contributors

  • @Npahlfer made their first contribution in https://github.com/m-bain/whisperX/pull/1110
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.2...v3.7.3
v3.7.2
BarabazsBarabazs·7mo ago·October 12, 2025
GitHub

📋 What's Changed

  • chore: refine triton dependency to restrict installation to x86_64 Linux by @Barabazs in https://github.com/m-bain/whisperX/pull/1259
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.1...v3.7.2
v3.7.1
BarabazsBarabazs·7mo ago·October 12, 2025
GitHub

📋 What's Changed

  • chore: update numpy dependency constraints for Python 3.13 compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1258
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.0...v3.7.1
v3.7.0
BarabazsBarabazs·7mo ago·October 10, 2025
GitHub

📋 What's Changed

  • feat: add support for python 3.13 by @Barabazs in https://github.com/m-bain/whisperX/pull/1256
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.0...v3.7.0
v3.6.0
BarabazsBarabazs·7mo ago·October 10, 2025
GitHub

📋 What's Changed

  • Update README.md to fix diarize code by @awan1 in https://github.com/m-bain/whisperX/pull/1192
  • Remove redundant variable & improve load_model function documentation by @3manifold in https://github.com/m-bain/whisperX/pull/1197
  • Update README.md to include --device cpu by @felagund in https://github.com/m-bain/whisperX/pull/1164
  • refactor: rename types.py to schema.py to avoid stdlib conflict by @Barabazs in https://github.com/m-bain/whisperX/pull/1252
  • feat: add centralized logging to replace ad-hoc print statements by @Barabazs in https://github.com/m-bain/whisperX/pull/1254

New Contributors

  • @awan1 made their first contribution in https://github.com/m-bain/whisperX/pull/1192
  • @felagund made their first contribution in https://github.com/m-bain/whisperX/pull/1164
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.0...v3.6.0
v3.5.0
BarabazsBarabazs·7mo ago·October 8, 2025
GitHub

📋 What's Changed

  • Add jr, sr, and ph.d to punkt abbreviations by @alexcannan in https://github.com/m-bain/whisperX/pull/1053
  • feat: use pre-trained Punkt model instead of empty parameters by @Barabazs in https://github.com/m-bain/whisperX/pull/1245
  • Change the alignment model for Vietnamese language by @nguyenvulebinh in https://github.com/m-bain/whisperX/pull/776
  • build: bump torch to 2.7.1 and CUDA 12.8 support by @jim60105 in https://github.com/m-bain/whisperX/pull/1182

New Contributors

  • @alexcannan made their first contribution in https://github.com/m-bain/whisperX/pull/1053
  • @nguyenvulebinh made their first contribution in https://github.com/m-bain/whisperX/pull/776
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.3...v3.5.0
v3.4.3
BarabazsBarabazs·8mo ago·October 1, 2025
GitHub

📋 What's Changed

  • Remove unused code in Vad class by @3manifold in https://github.com/m-bain/whisperX/pull/1079
  • fix vad model load bug. by @duj12 in https://github.com/m-bain/whisperX/pull/835
  • fix: restrict pyannote-audio version to avoid compatibility issues by @Barabazs in https://github.com/m-bain/whisperX/pull/1242

New Contributors

  • @duj12 made their first contribution in https://github.com/m-bain/whisperX/pull/835
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.2...v3.4.3
v3.4.2
BarabazsBarabazs·11mo ago·June 27, 2025
GitHub

📋 What's Changed

  • Fix: Ensure integer tensor indexing in get_wildcard_emission() to avoid IndexError by @HowardWhile in https://github.com/m-bain/whisperX/pull/1146

New Contributors

  • @HowardWhile made their first contribution in https://github.com/m-bain/whisperX/pull/1146
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.1...v3.4.2
v3.4.1
BarabazsBarabazs·11mo ago·June 25, 2025
GitHub

📋 What's Changed

  • fix: speaker embedding bug by @Barabazs in https://github.com/m-bain/whisperX/pull/1178
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.0...v3.4.1
v3.4.0
BarabazsBarabazs·11mo ago·June 24, 2025
GitHub

📋 What's Changed

  • chore: add lockfile check step to CI workflows by @Barabazs in https://github.com/m-bain/whisperX/pull/1130
  • docs: add common issue section for libcudnn dependencies in README by @Barabazs in https://github.com/m-bain/whisperX/pull/1161
  • feat: diarization model env config by @bgdnvk in https://github.com/m-bain/whisperX/pull/1101
  • docs: add missing torch import to Python usage example in README by @hammerill in https://github.com/m-bain/whisperX/pull/1168
  • feat: enhance diarization with optional output of speaker embeddings by @eek in https://github.com/m-bain/whisperX/pull/1085

New Contributors

  • @bgdnvk made their first contribution in https://github.com/m-bain/whisperX/pull/1101
  • @hammerill made their first contribution in https://github.com/m-bain/whisperX/pull/1168
  • @eek made their first contribution in https://github.com/m-bain/whisperX/pull/1085
  • Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.4...v3.4.0