POST
v1
/
replicate
/
tts
curl --request POST \
  --url http://localhost:9000/api/v1/replicate/tts \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'rvc_model=<string>' \
  --form 'custom_rvc_model_download_url=<string>' \
  --form 'pitch_change=<string>' \
  --form index_rate=123 \
  --form filter_raidus=123 \
  --form rms_mix_rate=123 \
  --form 'pitch_detection_algorithm=<string>' \
  --form crepe_hop_length=123 \
  --form protect=123 \
  --form main_vocals_volume_change=123 \
  --form backup_vocals_volume_change=123 \
  --form instrumental_volume_change=123 \
  --form pitch_change_all=123 \
  --form reverb_size=123 \
  --form reverb_wetness=123 \
  --form reverb_dryness=123 \
  --form reverb_damping=123 \
  --form 'output_format=<string>'

Query

model
string
default:
"zsxkib/realistic-voice-cloning"
required

The voice model to be used;

Body

song_input
file
required

Upload your audio file here.

rvc_model
string
default:
"Squidward"

RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a ‘custom_rvc_model_download_url’ is provided, this will be automatically set to the name of the downloaded model.

custom_rvc_model_download_url
string

URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn’t already exist) and used for prediction, regardless of the ‘rvc_model’ value.

pitch_change
string
default:
"no-change"

Adjust pitch of AI vocals. Options: no-change, male-to-female, female-to-male.

index_rate
float
default:
"0.5"

Control how much of the AI’s accent to leave in the vocals.

filter_raidus
int
default:
"3"

If >=3: apply median filtering median filtering to the harvested pitch results.

rms_mix_rate
float
default:
"0.25"

Control how much to use the original vocal’s loudness (0) or a fixed loudness (1).

pitch_detection_algorithm
string
default:
"rmvpe"

Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).

crepe_hop_length
int
default:
"128"

When pitch_detection_algo is set to mangio-crepe, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.

protect
float
default:
"0.33"

Control how much of the original vocals’ breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.

main_vocals_volume_change
float
default:
"10.0"

Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.

backup_vocals_volume_change
float
default:
"0.0"

Control volume of backup AI vocals.

instrumental_volume_change
float
default:
"0.0"

Control volume of the background music/instrumentals.

pitch_change_all
float
default:
"0.0"

Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.

reverb_size
float
default:
"0.15"

The larger the room, the longer the reverb time.

reverb_wetness
float
default:
"0.2"

Level of AI vocals with reverb.

reverb_dryness
float
default:
"0.8"

Level of AI vocals with reverb.

reverb_damping
float
default:
"0.7"

Absorption of high frequencies in the reverb.

output_format
string
default:
"mp3"

wav for best quality and large file size, mp3 for decent quality and small file size.