Synthesis - Typecast Documentation

Feature map

Need	Use
Immediate local playback	`cast "text"`
Reusable audio file	`--out file.wav` or `--out file.mp3 --format mp3`
Realtime-feeling agent response	Default playback without `--out`
Timestamp JSON	`--timestamps-out file.json`
SRT or WebVTT subtitles	`--timestamps-out file.srt` or `--timestamps-out file.vtt`
Custom cloned voice	`--voice-id uc_xxx` after `cast voices clone`

Basic usage

# Play immediately
cast "Hello, world!"

# Use a specific voice
cast "Hello, world!" --voice-id tc_xxx

# Save to WAV file
cast "Hello, world!" --out hello.wav

# Save to MP3 file
cast "Hello, world!" --out hello.mp3 --format mp3

# Save audio with SRT subtitles
cast "Hello, world. This is a test." --out hello.wav --timestamps-out hello.srt

By default, cast plays audio immediately. Use --out to save a WAV or MP3 file instead.

CLI’s immediate playback is the fastest terminal workflow for local realtime feedback. For API-level chunked streaming (POST /v1/text-to-speech/stream), see Streaming TTS and the SDK docs.

Options

Flag	Description	Default
`--voice-id`	Voice ID	`tc_60e5426de8b95f1d3000d7b5`
`--model`	Model (`ssfm-v30`, `ssfm-v21`)	`ssfm-v30`
`--language`	Language code (ISO 639-3)	auto-detected
`--emotion`	Emotion type: `smart`, `preset`
`--emotion-preset`	Preset emotion (requires `--emotion preset`)
`--emotion-intensity`	Emotion intensity 0.0-2.0 (requires `--emotion preset`)	`1.0`
`--prev-text`	Previous sentence for context (`--emotion smart` only)
`--next-text`	Next sentence for context (`--emotion smart` only)
`--volume`	Volume (0-200)	`100`
`--pitch`	Pitch in semitones (-12 to +12)	`0`
`--tempo`	Tempo multiplier (0.5-2.0)	`1.0`
`--format`	Output format (`wav`, `mp3`)	`wav`
`--seed`	Unsigned integer seed for reproducible output (`>= 0`)
`--out`	Save to file instead of playing
`--timestamps-out`	Save timestamp output to JSON, SRT, or WebVTT
`--timestamps-format`	Timestamp output format (`json`, `srt`, `vtt`)	inferred from `--timestamps-out`
`--granularity`	Timestamp granularity (`word`, `char`, `both`)	server default

Models

Model	Languages	Emotions	Latency
`ssfm-v30`	37	7 presets + smart emotion	Standard
`ssfm-v21`	27	4 presets: normal, happy, sad, angry	Low

cast "Hello, world!" --model ssfm-v21

Emotions

Smart Emotion
Preset Emotion

AI automatically infers the appropriate emotion from the text. Smart emotion is available with ssfm-v30.

cast "I just got promoted!" --emotion smart

Provide surrounding sentences for better context:

cast "I just got promoted!" --emotion smart \
  --prev-text "I have been working so hard this year." \
  --next-text "Let's celebrate tonight!"

Choose a specific emotion with --emotion-preset, and control its strength with --emotion-intensity.

Model	Available Presets
`ssfm-v30`	`normal`, `happy`, `sad`, `angry`, `whisper`, `toneup`, `tonedown`
`ssfm-v21`	`normal`, `happy`, `sad`, `angry`

cast "Hello, world!" --emotion preset --emotion-preset happy
cast "Hello, world!" --emotion preset --emotion-preset happy --emotion-intensity 2.0
cast "Hello, world!" --emotion preset --emotion-preset whisper --emotion-intensity 0.5
cast "Hello, world!" --model ssfm-v21 --emotion preset --emotion-preset sad

​Feature map

​Basic usage

​Options

​Models

​Emotions

Feature map

Basic usage

Options

Models

Emotions