GitPedia

Russian stt text normalization

Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks

From snakers4·Updated April 9, 2026·View on GitHub·
·Archived

Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks. The project is written primarily in Python, distributed under the GNU General Public License v3.0 license, first published in 2020. Key topics include: python3, pytorch, russian-language, speech, speech-to-text.

Normalization

Russian STT Text Normalization

Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks.

Requirements

  • Python >= 3.6
  • PyTorch >= 1.4 for s2s pipeline
  • tqdm for progress bar
pip install torch
pip install tqdm

Usage

python
from normalizer import Normalizer text = 'С 12.01.1943 г. площадь сельсовета — 1785,5 га.' norm = Normalizer() result = norm.norm_text(text) print(result)
>>> С двенадцатого января тысяча девятьсот сорок третьего года площадь сельсовета
>>> — тысяча семьсот восемьдесят пять целых и пять десятых гектара

Contributors

Showing top 3 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from snakers4/russian_stt_text_normalization via the GitHub API.Last fetched: 6/21/2026