Instant cloning
Clone a custom voice from a short audio sample and use it like any built-in voice in subsequent text-to-speech calls.
Upload a WAV or MP3 file (max 25 MB). The server extracts a speaker embedding and returns a custom voice ID with the uc_ prefix that can be passed directly to POST /v1/text-to-speech (and any other endpoint that accepts a voice_id). The original audio is uploaded to S3 in the background after the response is returned.
Limits
- Audio file: max 25 MB. Supported formats: WAV, MP3.
- Audio duration: 5 to 150 seconds.
- Voice name: 1-30 characters.
- Model:
ssfm-v21orssfm-v30. The cloned voice is bound to this engine model. - Each plan has a maximum number of active custom voices (the
custom_voice_slot). UseDELETE /v1/voices/{voice_id}to free a slot.
Typical flow
POST /v1/voices/clonewith the sample audio → receivevoice_id(e.g.uc_64a1b2...).POST /v1/text-to-speechwithvoice_idset to the cloned ID.DELETE /v1/voices/{voice_id}when you no longer need the voice.
Authorizations
API key for authentication. You can obtain an API key from the Typecast API Console.
Body
Response
Successful Response - Custom voice created
Response of POST /v1/voices/clone — custom voice metadata returned by instant cloning.
Custom voice identifier with the uc_ prefix. Use this value as voice_id in POST /v1/text-to-speech and other endpoints that accept voice_id.
"uc_64a1b2c3d4e5f6a7b8c9d0e1"
Human-readable voice name (1-30 characters).
Engine model the voice was cloned for (ssfm-v21 or ssfm-v30).
ssfm-v30, ssfm-v21