Python WORLD
This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports *python 3.0* and later. The project is written primarily in Python, distributed under the Other license, first published in 2018. Key topics include: manifold-learning, manifold-vocoder, pitch, pycharm, python.
PYTHON WORLD VOCODER:
This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports python 3.0 and later.
For technical detail, please check the website.
INSTALATION
Python WORLD uses the following dependencies:
- numpy, scipy
- matplotlib
- numba
- simpleaudio (just for demonstration)
Install python dependencies:
pip install -r requirements.txt
Or import the project with PyCharm and open requirements.txt in PyCharm.
It will ask to install the missing libraries by itself.
EXAMPLE
The easiest way to run those examples is to import the Python-WORLD folder into PyCharm.
In example/prodosy.py, there is an example of analysis/modification/synthesis with WORLD vocoder.
It has some examples of pitch, duration, spectrum modification.
First, we read an audio file:
pythonfrom scipy.io.wavfile import read as wavread fs, x_int16 = wavread(wav_path) x = x_int16 / (2 ** 15 - 1) # to float
Then, we declare a vocoder and encode the audio file:
pythonfrom world import main vocoder = main.World() # analysis dat = vocoder.encode(fs, x, f0_method='harvest')
in which, fs is sampling frequency and x is the speech signal.
The dat is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.
We can scale the pitch:
pythondat = vocoder.scale_pitch(dat, 1.5)
Be careful when you scale the pich because there is upper limit and lower limit.
We can make speech faster or slower:
pythondat = vocoder.scale_duration(dat, 2)
In test/speed.py, we estimate the time of analysis.
To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable is_requiem=True:
python# requiem analysis dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)
To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper Using a Manifold Vocoder for Spectral Voice and Style Conversion, check test/spectralFeatures.py. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.
Check out speech samples
NOTE:
-
The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency
F0. The centers of the windows are equally spaced with the distance offrame_periodms. -
The Fourier transform size (
fft_size) is determined automatically using sampling frequency and the lowest value of F0f0_floor.
When you want to specify your ownfft_size, you have to usef0_floor = 3.0 * fs / fft_size.
If you decreasefft_size, thef0_floorincreases. But, a highf0_floormight be not good for the analysis of male voices. -
The F0 analysis
Harvestis the slowest one. It's speeded up usingnumbaandpython multiprocessing. The more cores you have, the faster it can become. However, you can use your own F0 analysis. In our case, we support 3 F0 analysis:DIO, HARVEST, and SWIPE'
CITATION:
If you find the code helpful and want to cite it, please use:
Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.
CONTACT US
Post your questions, suggestions, and discussions to GitHub Issues.
Contributors
Showing top 3 contributors by commit count.
