Text To Speech

curl --request POST \
  --url http://localhost:9000/api/v1/replicate/stt \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'url=<string>' \
  --form 'task=<string>' \
  --form batch_size=123 \
  --form 'timestamp=<string>'

POST

replicate

stt

Text To Speech

curl --request POST \
  --url http://localhost:9000/api/v1/replicate/stt \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'url=<string>' \
  --form 'task=<string>' \
  --form batch_size=123 \
  --form 'timestamp=<string>'

Query

model

string

default:"turian/insanely-fast-whisper-with-video"

required

The model to be used;

Body

audio

file

required

Audio file. Either this or url must be provided.

url

string

Video URL for yt-dlp to download the audio from. Either this or audio must be provided.

task

string

default:"transcribe"

Task to perform: transcribe or translate to another language. (default: transcribe).

batch_size

int

default:"64"

Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 64).

timestamp

string

default:"chunk"

Whisper supports both chunked as well as word level timestamps. (default: chunk).

v1/replicate/stt Introduction

API Endpoints

Endpoint Examples

Text To Speech

Query

Body

API Endpoints

Endpoint Examples

​Query

​Body

Query

Body