Streaming Text To Speech

Authorizations

X-API-KEY

string

header

required

API key for authentication. You can obtain an API key from the Typecast API Console.

Body

application/json

Text-to-speech streaming request parameters

voice_id

string

required

Voice identifier. Two prefixes are supported:

tc_ — Built-in Typecast voices (e.g., tc_60e5426de8b95f1d3000d7b5). See Listing all voices for available IDs.
uc_ — Custom voices created via Instant cloning (e.g., uc_64a1b2c3d4e5f6a7b8c9d0e1). Only the owner of a cloned voice can use it.

Case-sensitive: must use lowercase prefix.

Example:

"tc_60e5426de8b95f1d3000d7b5"

text

string

required

Text to convert to speech. Minimum 1 character, maximum 2000 characters. Credits consumed based on text length. Supports multiple languages including English, Korean, Japanese, and Chinese. Special characters and punctuation are handled automatically.

Required string length: 1 - 2000

Example:

"Everything is so incredibly perfect that I feel like I'm dreaming."

model

enum<string>

required

Voice model to use for speech synthesis.

ssfm-v30: Latest model with improved prosody and additional emotion presets (recommended)
ssfm-v21: Stable production model with reliable quality

Available options:

ssfm-v30,

ssfm-v21

Example:

"ssfm-v30"

language

string

Language code following ISO 639-3 standard. Case-insensitive (both "ENG" and "eng" are accepted). If not provided, will be auto-detected based on text content.

ssfm-v30 Supported Languages (37)

Code	Language	Code	Language	Code	Language
ARA	Arabic	IND	Indonesian	POR	Portuguese
BEN	Bengali	ITA	Italian	RON	Romanian
BUL	Bulgarian	JPN	Japanese	RUS	Russian
CES	Czech	KOR	Korean	SLK	Slovak
DAN	Danish	MSA	Malay	SPA	Spanish
DEU	German	NAN	Min Nan	SWE	Swedish
ELL	Greek	NLD	Dutch	TAM	Tamil
ENG	English	NOR	Norwegian	TGL	Tagalog
FIN	Finnish	PAN	Punjabi	THA	Thai
FRA	French	POL	Polish	TUR	Turkish
HIN	Hindi	UKR	Ukrainian	VIE	Vietnamese
HRV	Croatian	YUE	Cantonese	ZHO	Chinese
HUN	Hungarian

ssfm-v21 Supported Languages (27)

Code	Language	Code	Language	Code	Language
ARA	Arabic	IND	Indonesian	RON	Romanian
BUL	Bulgarian	ITA	Italian	RUS	Russian
CES	Czech	JPN	Japanese	SLK	Slovak
DAN	Danish	KOR	Korean	SPA	Spanish
DEU	German	MSA	Malay	SWE	Swedish
ELL	Greek	NLD	Dutch	TAM	Tamil
ENG	English	POL	Polish	TGL	Tagalog
FIN	Finnish	POR	Portuguese	UKR	Ukrainian
FRA	French	HRV	Croatian	ZHO	Chinese

Example:

"eng"

prompt

SmartPrompt (ssfm-v30) · object

Emotion and style settings for the generated speech.

SmartPrompt (ssfm-v30)
PresetPrompt (ssfm-v30)
Prompt (ssfm-v21)

Show child attributes

Example:

{
  "emotion_type": "smart",
  "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
  "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}

output

OutputStream · object

Audio output settings including pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (wav/mp3). Note: volume and target_lufs are not available in streaming mode.

Show child attributes

seed

integer<uint32>

Unsigned integer seed for reproducible speech generation. The same seed with the same input parameters will produce identical audio output.

Must be a non-negative integer (≥ 0). Negative values are not accepted.
If omitted, the server generates a random seed each time, producing slight variations.

Required range: 0 <= x <= 4294967295

Example:

42

Response

Success - Returns streaming audio data in chunks

Chunked WAV audio stream (16-bit, mono, 32000 Hz). First chunk includes WAV header with size 0xFFFFFFFF (indicating streaming), followed by raw PCM data. Subsequent chunks contain only PCM data.

Text-to-Speech

Voices

Subscription

Authorizations

Body

Response

Text-to-Speech

Voices

Subscription

Documentation Index

Authorizations

Body

Response