m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
📋 What's Changed
- Add Indonesian model to alignment.py by @aziib in https://github.com/m-bain/whisperX/pull/1400
- fix: handle 'ignore' interpolation method in interpolate_nans (#1368) by @Barabazs in https://github.com/m-bain/whisperX/pull/1422
- build(deps): bump nltk from 3.9.2 to 3.9.4 by @dependabot[bot] in https://github.com/m-bain/whisperX/pull/1421
- ci: add zizmor workflow and harden existing workflows by @Barabazs in https://github.com/m-bain/whisperX/pull/1423
- chore(deps): update exclude-newer settings by @Barabazs in https://github.com/m-bain/whisperX/pull/1424
✨ New Contributors
- @aziib made their first contribution in https://github.com/m-bain/whisperX/pull/1400
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.5...v3.8.6
📋 What's Changed
- fix: pin torchvision and torchcodec for torch 2.8 compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1397
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.4...v3.8.5
📋 What's Changed
- feat: add progress_callback to transcribe, align, and diarize by @Barabazs in https://github.com/m-bain/whisperX/pull/1371
- fix: remove dead model_bytes read that leaked file handle by @Barabazs in https://github.com/m-bain/whisperX/pull/1381
- fix: restore word-level timestamps for unalignable characters by @Barabazs in https://github.com/m-bain/whisperX/pull/1386
- fix: require faster-whisper>=1.2.0 for use_auth_token support (#1385) by @Barabazs in https://github.com/m-bain/whisperX/pull/1388
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.2...v3.8.4
🐛 Bug Fixes
- Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.
🧪 Testing
- Add regression test for #1372 (da072d6)
- Add pytest dev dependency and CI test workflow (f9a3f8f)
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.8...v3.7.9
🐛 Bug Fixes
- Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.
🧪 Testing
- Add regression test for #1372 (da072d6)
- Add pytest dev dependency and CI test workflow (f9a3f8f)
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.1...v3.6.2
🐛 Bug Fixes
- Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.
🧪 Testing
- Add regression test for #1372 (da072d6)
- Add pytest dev dependency and CI test workflow (f9a3f8f)
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.1...v3.5.2
🐛 Bug Fixes
- Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch (v3.X.Y) reverted PR #986 which removed wildcard support entirely. Fixes #1372.
🧪 Testing
- Add regression test for #1372 (da072d6)
- Add pytest dev dependency and CI test workflow (f9a3f8f)
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.4...v3.4.5
🐛 Bug Fixes
- Restore timestamps for unalignable characters (39aa9f5): Words containing digits, symbols, or foreign script (e.g. `4,9`, `£13.60`) now get proper timestamps via a wildcard emission column. The previous patch reverted PR #986 which removed wildcard support entirely. Fixes #1372.
🧪 Testing
- Add regression test for #1372 (da072d6)
- Add pytest dev dependency and CI test workflow (f9a3f8f)
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.5...v3.3.6
📋 What's Changed
- feat: expose avg_logprob per segment from ctranslate2 beam search by @Barabazs in https://github.com/m-bain/whisperX/pull/1350
- fix: revert #986 wildcard alignment that broke word-level timestamps (#1220) by @Barabazs in https://github.com/m-bain/whisperX/pull/1367
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.1...v3.8.2
🐛 Bug Fixes
- Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
- Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.7...v3.7.8
🐛 Bug Fixes
- Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
- Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.0...v3.6.1
🐛 Bug Fixes
- Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
- Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.0...v3.5.1
🐛 Bug Fixes
- Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
- Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.3...v3.4.4
🐛 Bug Fixes
- Restore original CTC forced-alignment (f2609a6): PR #986 caused all words to anchor to the start of the segment window (silence) instead of actual speech. Reverts get_trellis/backtrack to the original PyTorch tutorial implementation. Fixes #1220.
- Fix blank_id hardcoded to 0 (636f298): Broke alignment for HuggingFace models where blank is [pad], not index 0.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.4...v3.3.5
📋 What's Changed
- Fix: Respect --model_dir and --model_cache_only during alignment by @MrPrayer in https://github.com/m-bain/whisperX/pull/1285
- feat: forward --hf_token to WhisperModel for gated/private model support by @Barabazs in https://github.com/m-bain/whisperX/pull/1351
✨ New Contributors
- @MrPrayer made their first contribution in https://github.com/m-bain/whisperX/pull/1285
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.8.0...v3.8.1
📋 What's Changed
- feat: migrate to pyannote-audio v4 with speaker-diarization-community-1 by @Barabazs in https://github.com/m-bain/whisperX/pull/1349
- Special thanks to @borgoat for taking the lead.
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.7...v3.8.0
📋 What's Changed
- Optimize assign_word_speakers with interval tree for 228x speedup by @Mr-Neutr0n in https://github.com/m-bain/whisperX/pull/1338
- fix: pass no_repeat_ngram_size and repetition_penalty to CTranslate2 generate() by @RickSanchez93 in https://github.com/m-bain/whisperX/pull/1340
- chore: update type hints by @1carlito in https://github.com/m-bain/whisperX/pull/1342 and https://github.com/m-bain/whisperX/pull/1343
- fix: derive SRT/VTT cue times from word-level timestamps by @Barabazs in https://github.com/m-bain/whisperX/pull/1347
✨ New Contributors
- @Mr-Neutr0n made their first contribution in https://github.com/m-bain/whisperX/pull/1338
- @RickSanchez93 made their first contribution in https://github.com/m-bain/whisperX/pull/1340
- @1carlito made their first contribution in https://github.com/m-bain/whisperX/pull/1342
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.6...v3.7.7
📋 What's Changed
- chore: drop python 3.9 support by @Barabazs in https://github.com/m-bain/whisperX/pull/1328
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.5...v3.7.6
📋 What's Changed
- docs: add cuDNN troubleshooting for common issues by @Barabazs in https://github.com/m-bain/whisperX/pull/1266
- feat: add hotwords argument to CLI for improved recognition of rare terms by @Barabazs in https://github.com/m-bain/whisperX/pull/1268
- Fix incorrect type annotations in get_writer function in utils.py by @JulianFP in https://github.com/m-bain/whisperX/pull/1144
- [1246] feat: added language-aware sentence tokenization by @pplkit in https://github.com/m-bain/whisperX/pull/1269
- fix: pin huggingface-hub<1.0.0 for pyannote-audio compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1327
✨ New Contributors
- @JulianFP made their first contribution in https://github.com/m-bain/whisperX/pull/1144
- @pplkit made their first contribution in https://github.com/m-bain/whisperX/pull/1269
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.4...v3.7.5
chore: upgrade torch and torchaudio dependencies to 2.8.0 **Full Changelog**: https://github.com/m-bain/whisperX/compare/v3.7.3...v3.7.4
📋 What's Changed
- feat: add Swedish alignment model by @Npahlfer in https://github.com/m-bain/whisperX/pull/1110
- fix: lock down torch and torchaudio versions by @Barabazs in https://github.com/m-bain/whisperX/pull/1265
✨ New Contributors
- @Npahlfer made their first contribution in https://github.com/m-bain/whisperX/pull/1110
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.2...v3.7.3
📋 What's Changed
- chore: refine triton dependency to restrict installation to x86_64 Linux by @Barabazs in https://github.com/m-bain/whisperX/pull/1259
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.1...v3.7.2
📋 What's Changed
- chore: update numpy dependency constraints for Python 3.13 compatibility by @Barabazs in https://github.com/m-bain/whisperX/pull/1258
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.7.0...v3.7.1
📋 What's Changed
- feat: add support for python 3.13 by @Barabazs in https://github.com/m-bain/whisperX/pull/1256
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.6.0...v3.7.0
📋 What's Changed
- Update README.md to fix diarize code by @awan1 in https://github.com/m-bain/whisperX/pull/1192
- Remove redundant variable & improve load_model function documentation by @3manifold in https://github.com/m-bain/whisperX/pull/1197
- Update README.md to include --device cpu by @felagund in https://github.com/m-bain/whisperX/pull/1164
- refactor: rename types.py to schema.py to avoid stdlib conflict by @Barabazs in https://github.com/m-bain/whisperX/pull/1252
- feat: add centralized logging to replace ad-hoc print statements by @Barabazs in https://github.com/m-bain/whisperX/pull/1254
✨ New Contributors
- @awan1 made their first contribution in https://github.com/m-bain/whisperX/pull/1192
- @felagund made their first contribution in https://github.com/m-bain/whisperX/pull/1164
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.5.0...v3.6.0
📋 What's Changed
- Add jr, sr, and ph.d to punkt abbreviations by @alexcannan in https://github.com/m-bain/whisperX/pull/1053
- feat: use pre-trained Punkt model instead of empty parameters by @Barabazs in https://github.com/m-bain/whisperX/pull/1245
- Change the alignment model for Vietnamese language by @nguyenvulebinh in https://github.com/m-bain/whisperX/pull/776
- build: bump torch to 2.7.1 and CUDA 12.8 support by @jim60105 in https://github.com/m-bain/whisperX/pull/1182
✨ New Contributors
- @alexcannan made their first contribution in https://github.com/m-bain/whisperX/pull/1053
- @nguyenvulebinh made their first contribution in https://github.com/m-bain/whisperX/pull/776
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.3...v3.5.0
📋 What's Changed
- Remove unused code in Vad class by @3manifold in https://github.com/m-bain/whisperX/pull/1079
- fix vad model load bug. by @duj12 in https://github.com/m-bain/whisperX/pull/835
- fix: restrict pyannote-audio version to avoid compatibility issues by @Barabazs in https://github.com/m-bain/whisperX/pull/1242
✨ New Contributors
- @duj12 made their first contribution in https://github.com/m-bain/whisperX/pull/835
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.2...v3.4.3
📋 What's Changed
- Fix: Ensure integer tensor indexing in get_wildcard_emission() to avoid IndexError by @HowardWhile in https://github.com/m-bain/whisperX/pull/1146
✨ New Contributors
- @HowardWhile made their first contribution in https://github.com/m-bain/whisperX/pull/1146
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.1...v3.4.2
📋 What's Changed
- fix: speaker embedding bug by @Barabazs in https://github.com/m-bain/whisperX/pull/1178
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.4.0...v3.4.1
📋 What's Changed
- chore: add lockfile check step to CI workflows by @Barabazs in https://github.com/m-bain/whisperX/pull/1130
- docs: add common issue section for libcudnn dependencies in README by @Barabazs in https://github.com/m-bain/whisperX/pull/1161
- feat: diarization model env config by @bgdnvk in https://github.com/m-bain/whisperX/pull/1101
- docs: add missing torch import to Python usage example in README by @hammerill in https://github.com/m-bain/whisperX/pull/1168
- feat: enhance diarization with optional output of speaker embeddings by @eek in https://github.com/m-bain/whisperX/pull/1085
✨ New Contributors
- @bgdnvk made their first contribution in https://github.com/m-bain/whisperX/pull/1101
- @hammerill made their first contribution in https://github.com/m-bain/whisperX/pull/1168
- @eek made their first contribution in https://github.com/m-bain/whisperX/pull/1085
- Full Changelog: https://github.com/m-bain/whisperX/compare/v3.3.4...v3.4.0