Text To Speech

curl --request POST \
  --url http://localhost:9000/api/v1/replicate/tts \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'rvc_model=<string>' \
  --form 'custom_rvc_model_download_url=<string>' \
  --form 'pitch_change=<string>' \
  --form index_rate=123 \
  --form filter_raidus=123 \
  --form rms_mix_rate=123 \
  --form 'pitch_detection_algorithm=<string>' \
  --form crepe_hop_length=123 \
  --form protect=123 \
  --form main_vocals_volume_change=123 \
  --form backup_vocals_volume_change=123 \
  --form instrumental_volume_change=123 \
  --form pitch_change_all=123 \
  --form reverb_size=123 \
  --form reverb_wetness=123 \
  --form reverb_dryness=123 \
  --form reverb_damping=123 \
  --form 'output_format=<string>'

POST

replicate

tts

Text To Speech

curl --request POST \
  --url http://localhost:9000/api/v1/replicate/tts \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'rvc_model=<string>' \
  --form 'custom_rvc_model_download_url=<string>' \
  --form 'pitch_change=<string>' \
  --form index_rate=123 \
  --form filter_raidus=123 \
  --form rms_mix_rate=123 \
  --form 'pitch_detection_algorithm=<string>' \
  --form crepe_hop_length=123 \
  --form protect=123 \
  --form main_vocals_volume_change=123 \
  --form backup_vocals_volume_change=123 \
  --form instrumental_volume_change=123 \
  --form pitch_change_all=123 \
  --form reverb_size=123 \
  --form reverb_wetness=123 \
  --form reverb_dryness=123 \
  --form reverb_damping=123 \
  --form 'output_format=<string>'

Query

model

string

default:"zsxkib/realistic-voice-cloning"

required

The voice model to be used;

Body

song_input

file

required

Upload your audio file here.

rvc_model

string

default:"Squidward"

RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a ‘custom_rvc_model_download_url’ is provided, this will be automatically set to the name of the downloaded model.

custom_rvc_model_download_url

string

URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn’t already exist) and used for prediction, regardless of the ‘rvc_model’ value.

pitch_change

string

default:"no-change"

Adjust pitch of AI vocals. Options: no-change, male-to-female, female-to-male.

index_rate

float

default:"0.5"

Control how much of the AI’s accent to leave in the vocals.

filter_raidus

int

default:"3"

If >=3: apply median filtering median filtering to the harvested pitch results.

rms_mix_rate

float

default:"0.25"

Control how much to use the original vocal’s loudness (0) or a fixed loudness (1).

pitch_detection_algorithm

string

default:"rmvpe"

Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).

crepe_hop_length

int

default:"128"

When pitch_detection_algo is set to mangio-crepe, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.

protect

float

default:"0.33"

Control how much of the original vocals’ breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.

main_vocals_volume_change

float

default:"10.0"

Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.

backup_vocals_volume_change

float

default:"0.0"

Control volume of backup AI vocals.

instrumental_volume_change

float

default:"0.0"

Control volume of the background music/instrumentals.

pitch_change_all

float

default:"0.0"

Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.

reverb_size

float

default:"0.15"

The larger the room, the longer the reverb time.

reverb_wetness

float

default:"0.2"

Level of AI vocals with reverb.

reverb_dryness

float

default:"0.8"

Level of AI vocals with reverb.

reverb_damping

float

default:"0.7"

Absorption of high frequencies in the reverb.

output_format

string

default:"mp3"

wav for best quality and large file size, mp3 for decent quality and small file size.

v1/replicate/tts v1/replicate/tts

API Endpoints

Endpoint Examples

Text To Speech

Query

Body

API Endpoints

Endpoint Examples

​Query

​Body

Query

Body