Skip to main content

Feature map

NeedUse
Immediate local playbackcast "text"
Reusable audio file--out file.wav or --out file.mp3 --format mp3
Realtime-feeling agent responseDefault playback without --out
Timestamp JSON--timestamps-out file.json
SRT or WebVTT subtitles--timestamps-out file.srt or --timestamps-out file.vtt
Custom cloned voice--voice-id uc_xxx after cast voices clone

Basic usage

# Play immediately
cast "Hello, world!"

# Use a specific voice
cast "Hello, world!" --voice-id tc_xxx

# Save to WAV file
cast "Hello, world!" --out hello.wav

# Save to MP3 file
cast "Hello, world!" --out hello.mp3 --format mp3

# Save audio with SRT subtitles
cast "Hello, world. This is a test." --out hello.wav --timestamps-out hello.srt
By default, cast plays audio immediately. Use --out to save a WAV or MP3 file instead.
CLI’s immediate playback is the fastest terminal workflow for local realtime feedback. For API-level chunked streaming (POST /v1/text-to-speech/stream), see Streaming TTS and the SDK docs.

Options

FlagDescriptionDefault
--voice-idVoice IDtc_60e5426de8b95f1d3000d7b5
--modelModel (ssfm-v30, ssfm-v21)ssfm-v30
--languageLanguage code (ISO 639-3)auto-detected
--emotionEmotion type: smart, preset
--emotion-presetPreset emotion (requires --emotion preset)
--emotion-intensityEmotion intensity 0.0-2.0 (requires --emotion preset)1.0
--prev-textPrevious sentence for context (--emotion smart only)
--next-textNext sentence for context (--emotion smart only)
--volumeVolume (0-200)100
--pitchPitch in semitones (-12 to +12)0
--tempoTempo multiplier (0.5-2.0)1.0
--formatOutput format (wav, mp3)wav
--seedUnsigned integer seed for reproducible output (>= 0)
--outSave to file instead of playing
--timestamps-outSave timestamp output to JSON, SRT, or WebVTT
--timestamps-formatTimestamp output format (json, srt, vtt)inferred from --timestamps-out
--granularityTimestamp granularity (word, char, both)server default

Models

ModelLanguagesEmotionsLatency
ssfm-v30377 presets + smart emotionStandard
ssfm-v21274 presets: normal, happy, sad, angryLow
cast "Hello, world!" --model ssfm-v21

Emotions

AI automatically infers the appropriate emotion from the text. Smart emotion is available with ssfm-v30.
cast "I just got promoted!" --emotion smart
Provide surrounding sentences for better context:
cast "I just got promoted!" --emotion smart \
  --prev-text "I have been working so hard this year." \
  --next-text "Let's celebrate tonight!"