Generate speech from text using real-time streaming, allowing audio playback to begin before the entire synthesis is complete.
This endpoint streams audio data in chunks, enabling low-latency audio playback for applications requiring immediate feedback.
Streaming Format:
Use Cases:
Request Parameters:
Uses the same TTSRequest schema as the standard TTS endpoint. Set output.audio_format to “wav” or “mp3” to control the streaming format.
Documentation Index
Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
API key for authentication. You can obtain an API key from the Typecast API Console.
Text-to-speech streaming request parameters
Voice ID in format 'tc_' followed by a unique identifier (e.g., 'tc_60e5426de8b95f1d3000d7b5'). Case-sensitive: must use lowercase (tc_xxx). See Listing all voices for available voices.
"tc_60e5426de8b95f1d3000d7b5"
Text to convert to speech. Minimum 1 character, maximum 2000 characters. Credits consumed based on text length. Supports multiple languages including English, Korean, Japanese, and Chinese. Special characters and punctuation are handled automatically.
1 - 2000"Everything is so incredibly perfect that I feel like I'm dreaming."
Voice model to use for speech synthesis.
ssfm-v30, ssfm-v21 "ssfm-v30"
Language code following ISO 639-3 standard. Case-insensitive (both "ENG" and "eng" are accepted). If not provided, will be auto-detected based on text content.
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
| ARA | Arabic | IND | Indonesian | POR | Portuguese |
| BEN | Bengali | ITA | Italian | RON | Romanian |
| BUL | Bulgarian | JPN | Japanese | RUS | Russian |
| CES | Czech | KOR | Korean | SLK | Slovak |
| DAN | Danish | MSA | Malay | SPA | Spanish |
| DEU | German | NAN | Min Nan | SWE | Swedish |
| ELL | Greek | NLD | Dutch | TAM | Tamil |
| ENG | English | NOR | Norwegian | TGL | Tagalog |
| FIN | Finnish | PAN | Punjabi | THA | Thai |
| FRA | French | POL | Polish | TUR | Turkish |
| HIN | Hindi | UKR | Ukrainian | VIE | Vietnamese |
| HRV | Croatian | YUE | Cantonese | ZHO | Chinese |
| HUN | Hungarian |
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
| ARA | Arabic | IND | Indonesian | RON | Romanian |
| BUL | Bulgarian | ITA | Italian | RUS | Russian |
| CES | Czech | JPN | Japanese | SLK | Slovak |
| DAN | Danish | KOR | Korean | SPA | Spanish |
| DEU | German | MSA | Malay | SWE | Swedish |
| ELL | Greek | NLD | Dutch | TAM | Tamil |
| ENG | English | POL | Polish | TGL | Tagalog |
| FIN | Finnish | POR | Portuguese | UKR | Ukrainian |
| FRA | French | HRV | Croatian | ZHO | Chinese |
"eng"
Emotion and style settings for the generated speech.
{
"emotion_type": "smart",
"previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
"next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}Audio output settings including pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (wav/mp3). Note: volume and target_lufs are not available in streaming mode.
Unsigned integer seed for reproducible speech generation. The same seed with the same input parameters will produce identical audio output.
x >= 042
Success - Returns streaming audio data in chunks
Chunked WAV audio stream (16-bit, mono, 32000 Hz). First chunk includes WAV header with size 0xFFFFFFFF (indicating streaming), followed by raw PCM data. Subsequent chunks contain only PCM data.