Skip to main content
POST
/
v1
/
text-to-speech
/
stream
cURL (stream + play)
# Pipe streaming audio directly into ffplay for real-time playback.
# Requires: ffmpeg (brew/choco/apt install ffmpeg)
curl -N -s --request POST \
  --url https://api.typecast.ai/v1/text-to-speech/stream \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <api-key>' \
  --data @- <<EOF | ffplay -autoexit -nodisp -loglevel error -i pipe:0
{
  "voice_id": "tc_60e5426de8b95f1d3000d7b5",
  "text": "Thanks for reaching out. Your reservation has been confirmed for Friday at 7 PM.",
  "model": "ssfm-v30"
}
EOF
"[Binary audio stream - WAV chunks]"

Documentation Index

Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

X-API-KEY
string
header
required

API key for authentication. You can obtain an API key from the Typecast API Console.

Body

application/json

Text-to-speech streaming request parameters

voice_id
string
required

Voice ID in format 'tc_' followed by a unique identifier (e.g., 'tc_60e5426de8b95f1d3000d7b5'). Case-sensitive: must use lowercase (tc_xxx). See Listing all voices for available voices.

Example:

"tc_60e5426de8b95f1d3000d7b5"

text
string
required

Text to convert to speech. Minimum 1 character, maximum 2000 characters. Credits consumed based on text length. Supports multiple languages including English, Korean, Japanese, and Chinese. Special characters and punctuation are handled automatically.

Required string length: 1 - 2000
Example:

"Everything is so incredibly perfect that I feel like I'm dreaming."

model
enum<string>
required

Voice model to use for speech synthesis.

  • ssfm-v30: Latest model with improved prosody and additional emotion presets (recommended)
  • ssfm-v21: Stable production model with reliable quality
Available options:
ssfm-v30,
ssfm-v21
Example:

"ssfm-v30"

language
string

Language code following ISO 639-3 standard. Case-insensitive (both "ENG" and "eng" are accepted). If not provided, will be auto-detected based on text content.

ssfm-v30 Supported Languages (37)
CodeLanguageCodeLanguageCodeLanguage
ARAArabicINDIndonesianPORPortuguese
BENBengaliITAItalianRONRomanian
BULBulgarianJPNJapaneseRUSRussian
CESCzechKORKoreanSLKSlovak
DANDanishMSAMalaySPASpanish
DEUGermanNANMin NanSWESwedish
ELLGreekNLDDutchTAMTamil
ENGEnglishNORNorwegianTGLTagalog
FINFinnishPANPunjabiTHAThai
FRAFrenchPOLPolishTURTurkish
HINHindiUKRUkrainianVIEVietnamese
HRVCroatianYUECantoneseZHOChinese
HUNHungarian
ssfm-v21 Supported Languages (27)
CodeLanguageCodeLanguageCodeLanguage
ARAArabicINDIndonesianRONRomanian
BULBulgarianITAItalianRUSRussian
CESCzechJPNJapaneseSLKSlovak
DANDanishKORKoreanSPASpanish
DEUGermanMSAMalaySWESwedish
ELLGreekNLDDutchTAMTamil
ENGEnglishPOLPolishTGLTagalog
FINFinnishPORPortugueseUKRUkrainian
FRAFrenchHRVCroatianZHOChinese
Example:

"eng"

prompt
SmartPrompt (ssfm-v30) · object

Emotion and style settings for the generated speech.

Example:
{
"emotion_type": "smart",
"previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
"next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}
output
OutputStream · object

Audio output settings including pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (wav/mp3). Note: volume and target_lufs are not available in streaming mode.

seed
integer<uint32>

Unsigned integer seed for reproducible speech generation. The same seed with the same input parameters will produce identical audio output.

  • Must be a non-negative integer (≥ 0). Negative values are not accepted.
  • If omitted, the server generates a random seed each time, producing slight variations.
Required range: x >= 0
Example:

42

Response

Success - Returns streaming audio data in chunks

Chunked WAV audio stream (16-bit, mono, 32000 Hz). First chunk includes WAV header with size 0xFFFFFFFF (indicating streaming), followed by raw PCM data. Subsequent chunks contain only PCM data.