lipku/LiveTalking
Real time interactive streaming digital human
6 Releases
Latest: today
v2.0.4Latest
Support yaml config ```shell # Usage: cp config.yaml.example config.yaml # create your config python app.py # start with config.yaml python app.py -c config_prod.yaml # use a different config file python app.py -c '' # skip yaml, CLI only # Priority: CLI args > YAML > argparse defaults ```
v2.0.3
✨ ✨ New Features
- OmniTTS — vLLM Omni Speech Adapter
- Added `tts/omnitts.py`, a new TTS backend that calls the vLLM Omni OpenAI-compatible speech API (`POST /v1/audio/speech`). Supports all vLLM Omni models (Qwen3-TTS, Fish Speech S2, CosyVoice3, Voxtral, VoxCPM2, MOSS-TTS-Nano) with configurable source sample rate, automatic resampling to 16 kHz, and per-message parameter overrides (voice, language, speed, instructions, task type). [usage](https://doc.livetalking.ai/docs/usage/#331-omni-tts)
- TTS Voice Manager Web UI
- Added `web/tts/index.html` and `web/tts/index-en.html` — a full-featured voice management dashboard:
- Voice list — browse preset and uploaded voices from the vLLM Omni server, with one-click selection and voice deletion.
- Voice clone upload — upload audio samples to clone new voices with auto-generated consent ID, required reference text, and a browser-native speech recognition button for auto-transcription.
- Speech synthesis test — select a voice, enter text, and synthesize speech with adjustable speed, language (11 languages), output format (WAV/MP3/FLAC/AAC/Opus/PCM), and task type. Results play in-page and are downloadable.
- Added quick-link cards to the main index pages pointing to the TTS manager.
- + 3 more
v2.0.2
📋 Changes
- Created admin.html for managing configurations and active sessions with real-time updates.
- Developed avatar.html for submitting avatar generation tasks and displaying task status.
- Introduced index.html as the main entry point for the application, linking to avatar generation and admin dashboard.
v2.0.1
sessionid类型由int改成str,方便扩展。 **注意:前端api接口该字段类型都需要改**
v2.0
1,重构代码,数字人模型和音频特征代码移到avatars目录下 2,数字人模型、tts、传输方式改成plugin方式接入 3,传输方式整理成单独的类,放到streamout目录下,添加rtmp输出 4,数字人推理和回贴公共部分代码整合到BaseAvatar中,各模型只需要实现自己特有部分代码 5,音频特征的切分代码整合到BaseAsr中 6,tts添加阿里云qwentts
v1.2
添加musetalk v1.5
