Voice Cloning App
A Python/Pytorch app for easily synthesising human voices
**Voice Cloning App** is A Python/Pytorch app for easily synthesising human voices The project is written primarily in Python, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2021. It has gained significant community traction with 1,437 stars and 239 forks on GitHub. Key topics include: deep-learning, python, pytorch, tacotron2, text-to-speech.
Voice Cloning App
A Python/Pytorch app for easily synthesising human voices

Documentation
Discord Server
Video guide
Voice Sharing Hub
FAQ's
System Requirements
- Windows 10 or Ubuntu 20.04+ operating system
- 5GB+ Disk space
- NVIDIA GPU with at least 4GB of memory & driver version 456.38+ (optional)
Key features
- Automatic dataset generation (with support for subtitles and audiobooks)
- Additional language support
- Local & remote training
- Easy train start/stop
- Data importing/exporting
- Multi GPU support
Manual Guides
Future Improvements
- Add support for Talknet
- Add GTA alignment for Hifi-gan
- Improved batch size estimation
- AMD GPU support
Other resources
- Remote training notebook
- Try out existing voices at uberduck.ai and Vocodes
- Youtube data fetching (created by Diskr33t#5880)
- Synthesize in Colab (created by mega b#6696)
- Generate youtube transcription (created by mega b#6696)
- Wit.ai transcription
Acknowledgements
This project uses a reworked version of Tacotron2. All rights for belong to NVIDIA and follow the requirements of their BSD-3 licence.
Additionally, the project uses DSAlign, Silero, DeepSpeech & hifi-gan.
Thank you to Dr. John Bustard at Queen's University Belfast for his support throughout the project.
Supported by uberduck.ai, reach out to them for live model hosting.
Also a big thanks to the members of the VocalSynthesis subreddit for their feedback.
Finally thank you to everyone raising issues and contributing to the project.
Contributors
Showing top 11 contributors by commit count.
