GitPedia

Subsai

๐ŸŽž๏ธ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants ๐ŸŽž๏ธ

From absadikiยทUpdated June 13, 2026ยทView on GitHubยท

Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants The project is written primarily in Python, distributed under the GNU General Public License v3.0 license, first published in 2023. It has gained significant community traction with 1,668 stars and 140 forks on GitHub. Key topics include: cli, subtitles, subtitles-generator, webui, whisper.

๏ธ๐ŸŽž๏ธ Subs AI ๐ŸŽž๏ธ

Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants
<br/>

<p align="center"> <img src="./assets/demo/demo.gif"> </p> <!-- TOC --> <!-- TOC -->

Features

  • Supported Models

    • openai/whisper
      • Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

    • linto-ai/whisper-timestamped
      • Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    • ggerganov/whisper.cpp (using absadiki/pywhispercpp)
      • High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model

        • Plain C/C++ implementation without dependencies
        • Runs on the CPU
    • guillaumekln/faster-whisper
      • faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.

        This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

    • m-bain/whisperX
      • fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.

        • โšก๏ธ Batched inference for 70x realtime transcription using whisper large-v2
        • ๐Ÿชถ faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
        • ๐ŸŽฏ Accurate word-level timestamps using wav2vec2 alignment
        • ๐Ÿ‘ฏโ€โ™‚๏ธ Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
        • ๐Ÿ—ฃ๏ธ VAD preprocessing, reduces hallucination & batching with no WER degradation.
    • jianfch/stable-ts
      • Stabilizing Timestamps for Whisper: This library modifies Whisper to produce more reliable timestamps and extends its functionality.

    • Hugging Face Transformers
      • Hugging Face implementation of Whisper. Any speech recognition pretrained model from the Hugging Face hub can be used as well.

    • API/openai/whisper
      • OpenAI Whisper via their API. Or any other openai-like API for whisper (e.g. speaches.ai)

  • Web UI

  • Command Line Interface

    • For simple or batch processing
  • Python package

    • In case you want to develop your own scripts
  • Supports different subtitle formats thanks to tkarabela/pysubs2

    • SubRip
    • WebVTT
    • substation alpha
    • MicroDVD
    • MPL2
    • TMP
  • Supports audio and video files

Installation

Quoted from the official openai/whisper installation

It requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

bash
# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # on Arch Linux sudo pacman -S ffmpeg # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg

You may need rust installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

bash
pip install setuptools-rust
  • Once ffmpeg is installed, install subsai
shell
pip install git+https://github.com/absadiki/subsai

[!NOTE]

  • It is recommended to use Python 3.10 or 3.11. Versions 3.12 or later may have compatibility issues.
  • If torch is unable to detect your GPU devices during your usage of subsai, assuming you have a supported GPU device, there is a chance that pip installed the CPU version of torch. You can install a torch version with CUDA support by following the get started locally guide on pytorch.
    For more information, see https://github.com/absadiki/subsai/issues/162.

Usage

Web-UI

To use the web-UI, run the following command on the terminal

shell
subsai-webui

And a web page will open on your default browser, otherwise navigate to the links provided by the command

You can also run the Web-UI using Docker.

CLI

shell
usage: subsai [-h] [--version] [-m MODEL] [-mc MODEL_CONFIGS] [-f FORMAT] [-df DESTINATION_FOLDER] [-tm TRANSLATION_MODEL] [-tc TRANSLATION_CONFIGS] [-tsl TRANSLATION_SOURCE_LANG] [-ttl TRANSLATION_TARGET_LANG] media_file [media_file ...] positional arguments: media_file The path of the media file, a list of files, or a text file containing paths for batch processing. options: -h, --help show this help message and exit --version show program's version number and exit -m MODEL, --model MODEL The transcription AI models. Available models: ['openai/whisper', 'linto-ai/whisper-timestamped'] -mc MODEL_CONFIGS, --model-configs MODEL_CONFIGS JSON configuration (path to a json file or a direct string) -f FORMAT, --format FORMAT, --subtitles-format FORMAT Output subtitles format, available formats ['.srt', '.ass', '.ssa', '.sub', '.json', '.txt', '.vtt'] -df DESTINATION_FOLDER, --destination-folder DESTINATION_FOLDER The directory where the subtitles will be stored, default to the same folder where the media file(s) is stored. -tm TRANSLATION_MODEL, --translation-model TRANSLATION_MODEL Translate subtitles using AI models, available models: ['facebook/m2m100_418M', 'facebook/m2m100_1.2B', 'facebook/mbart-large-50-many-to-many-mmt'] -tc TRANSLATION_CONFIGS, --translation-configs TRANSLATION_CONFIGS JSON configuration (path to a json file or a direct string) -tsl TRANSLATION_SOURCE_LANG, --translation-source-lang TRANSLATION_SOURCE_LANG Source language of the subtitles -ttl TRANSLATION_TARGET_LANG, --translation-target-lang TRANSLATION_TARGET_LANG Target language of the subtitles

Example of a simple usage

shell
subsai ./assets/test1.mp4 --model openai/whisper --model-configs '{"model_type": "small"}' --format srt

Note: For Windows CMD, You will need to use the following :
subsai ./assets/test1.mp4 --model openai/whisper --model-configs "{\"model_type\": \"small\"}" --format srt

You can also provide a simple text file for batch processing
(Every line should contain the absolute path to a single media file)

shell
subsai media.txt --model openai/whisper --format srt

From Python

To install:

  1. git clone https://github.com/absadiki/subsai
  2. cd subsai
  3. uv pip install -e .

Note: For minimal installs or if having issues installing dependencies, you can comment the dependencies for backends you won't use in the file requirements.txt.

python
from subsai import SubsAI file = './assets/test1.mp4' subs_ai = SubsAI() model = subs_ai.create_model('openai/whisper', {'model_type': 'base'}) subs = subs_ai.transcribe(file, model) subs.save('test1.srt')

For more advanced usage, read the documentation.

Examples

Simple examples can be found in the examples folder

  • VAD example: process long audio files using silero-vad. <a target="_blank" href="https://colab.research.google.com/github/absadiki/subsai/blob/main/examples/subsai_vad.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

    </a>
  • Translation example: translate an already existing subtitles file. <a target="_blank" href="https://colab.research.google.com/github/absadiki/subsai/blob/main/examples/subsai_translation.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

    </a>

Docker

  • Make sure that you have docker installed.

  • Prebuilt image

    1. docker pull absadiki/subsai:main
    2. docker run --gpus=all -p 8501:8501 -v /path/to/your/media_files/folder:/media_files absadiki/subsai:main
  • Build the image locally

    1. Clone and cd to the repository
    2. docker compose build
    3. docker compose run -p 8501:8501 -v /path/to/your/media_files/folder:/media_files subsai-webui # subsai-webui-cpu for cpu only
  • You can access your media files through the mounted media_files folder.

Notes

  • If you have an NVIDIA graphics card, you may need to install cuda to use the GPU capabilities.
  • AMD GPUs compatible with Pytorch should be working as well. #67
  • Transcription time is shown on the terminal, keep an eye on it while running the web UI.
  • If you didn't like Dark mode web UI, you can switch to Light mode from settings > Theme > Light.

Contributing

If you find a bug, have a suggestion or feedback, please open an issue for discussion.

License

This project is licensed under the GNU General Licence version 3 or later. You can modify or redistribute it under the conditions
of these licences (See LICENSE for more information).

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub โ†’

This article is auto-generated from absadiki/subsai via the GitHub API.Last fetched: 6/14/2026