Generate speech from text using the specified voice model. Supports emotion, volume, pitch, and tempo customization.
First, list all available voice models using the GET /v2/voices endpoint, then use the voice_id from the response to generate speech with this endpoint. Each voice model has its own unique characteristics. See Listing all voices for available voices.
API key for authentication. You can obtain an API key from the Typecast dashboard.
Voice ID in format 'tc_' followed by a unique identifier (e.g., 'tc_60e5426de8b95f1d3000d7b5'). Case-sensitive: must use lowercase (tc_xxx). See Listing all voices for available voices.
"tc_60e5426de8b95f1d3000d7b5"
Text to convert to speech. Minimum 1 character, maximum 2000 characters. Credits consumed based on text length. Supports multiple languages including English, Korean, Japanese, and Chinese. Special characters and punctuation are handled automatically.
1 - 2000"Everything is so incredibly perfect that I feel like I'm dreaming."
Voice model to use for speech synthesis.
ssfm-v30, ssfm-v21 "ssfm-v30"
Language code following ISO 639-3 standard. Case-insensitive (both "ENG" and "eng" are accepted). If not provided, will be auto-detected based on text content.
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
| ARA | Arabic | IND | Indonesian | POR | Portuguese |
| BEN | Bengali | ITA | Italian | RON | Romanian |
| BUL | Bulgarian | JPN | Japanese | RUS | Russian |
| CES | Czech | KOR | Korean | SLK | Slovak |
| DAN | Danish | MSA | Malay | SPA | Spanish |
| DEU | German | NAN | Min Nan | SWE | Swedish |
| ELL | Greek | NLD | Dutch | TAM | Tamil |
| ENG | English | NOR | Norwegian | TGL | Tagalog |
| FIN | Finnish | PAN | Punjabi | THA | Thai |
| FRA | French | POL | Polish | TUR | Turkish |
| HIN | Hindi | UKR | Ukrainian | VIE | Vietnamese |
| HRV | Croatian | YUE | Cantonese | ZHO | Chinese |
| HUN | Hungarian |
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
| ARA | Arabic | IND | Indonesian | RON | Romanian |
| BUL | Bulgarian | ITA | Italian | RUS | Russian |
| CES | Czech | JPN | Japanese | SLK | Slovak |
| DAN | Danish | KOR | Korean | SPA | Spanish |
| DEU | German | MSA | Malay | SWE | Swedish |
| ELL | Greek | NLD | Dutch | TAM | Tamil |
| ENG | English | POL | Polish | TGL | Tagalog |
| FIN | Finnish | POR | Portuguese | UKR | Ukrainian |
| FRA | French | HRV | Croatian | ZHO | Chinese |
"eng"
Emotion and style settings for the generated speech.
{
"emotion_type": "smart",
"previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
"next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}Audio output settings including volume (0-200), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (wav/mp3) for controlling the final audio characteristics
Random seed for controlling speech generation variations. Use any integer value to influence the output.
42
Success - Returns audio file
WAV audio file binary data. Uncompressed PCM audio with 16-bit depth, mono channel, 44100 Hz sample rate.