Text To Speech
Text to speech using the zsxkib/realistic-voice-cloning AI Model.
Query
The voice model to be used;
Body
Upload your audio file here.
RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a ‘custom_rvc_model_download_url’ is provided, this will be automatically set to the name of the downloaded model.
URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn’t already exist) and used for prediction, regardless of the ‘rvc_model’ value.
Adjust pitch of AI vocals. Options: no-change
, male-to-female
, female-to-male
.
Control how much of the AI’s accent to leave in the vocals.
If >=3: apply median filtering median filtering to the harvested pitch results.
Control how much to use the original vocal’s loudness (0) or a fixed loudness (1).
Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).
When pitch_detection_algo
is set to mangio-crepe
, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.
Control how much of the original vocals’ breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.
Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.
Control volume of backup AI vocals.
Control volume of the background music/instrumentals.
Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.
The larger the room, the longer the reverb time.
Level of AI vocals with reverb.
Level of AI vocals with reverb.
Absorption of high frequencies in the reverb.
wav for best quality and large file size, mp3 for decent quality and small file size.
Was this page helpful?