Speech To Text
Text To Speech
Speech To Text using the turian/insanely-fast-whisper-with-video AI Model.
POST
Query
The model to be used;
Body
Audio file. Either this or url must be provided.
Video URL for yt-dlp to download the audio from. Either this or audio must be provided.
Task to perform: transcribe or translate to another language. (default: transcribe).
Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 64).
Whisper supports both chunked as well as word level timestamps. (default: chunk).
Was this page helpful?