POST
v1
/
replicate
/
tts

Query

model
string
default: "zsxkib/realistic-voice-cloning"required

The voice model to be used;

Body

song_input
file
required

Upload your audio file here.

rvc_model
string
default: "Squidward"

RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a ‘custom_rvc_model_download_url’ is provided, this will be automatically set to the name of the downloaded model.

custom_rvc_model_download_url
string

URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn’t already exist) and used for prediction, regardless of the ‘rvc_model’ value.

pitch_change
string
default: "no-change"

Adjust pitch of AI vocals. Options: no-change, male-to-female, female-to-male.

index_rate
float
default: "0.5"

Control how much of the AI’s accent to leave in the vocals.

filter_raidus
int
default: "3"

If >=3: apply median filtering median filtering to the harvested pitch results.

rms_mix_rate
float
default: "0.25"

Control how much to use the original vocal’s loudness (0) or a fixed loudness (1).

pitch_detection_algorithm
string
default: "rmvpe"

Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).

crepe_hop_length
int
default: "128"

When pitch_detection_algo is set to mangio-crepe, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.

protect
float
default: "0.33"

Control how much of the original vocals’ breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.

main_vocals_volume_change
float
default: "10.0"

Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.

backup_vocals_volume_change
float
default: "0.0"

Control volume of backup AI vocals.

instrumental_volume_change
float
default: "0.0"

Control volume of the background music/instrumentals.

pitch_change_all
float
default: "0.0"

Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.

reverb_size
float
default: "0.15"

The larger the room, the longer the reverb time.

reverb_wetness
float
default: "0.2"

Level of AI vocals with reverb.

reverb_dryness
float
default: "0.8"

Level of AI vocals with reverb.

reverb_damping
float
default: "0.7"

Absorption of high frequencies in the reverb.

output_format
string
default: "mp3"

wav for best quality and large file size, mp3 for decent quality and small file size.