Feature map
| Need | Use |
|---|---|
| Immediate local playback | cast "text" |
| Reusable audio file | --out file.wav or --out file.mp3 --format mp3 |
| Realtime-feeling agent response | Default playback without --out |
| Timestamp JSON | --timestamps-out file.json |
| SRT or WebVTT subtitles | --timestamps-out file.srt or --timestamps-out file.vtt |
| Custom cloned voice | --voice-id uc_xxx after cast voices clone |
Basic usage
cast plays audio immediately. Use --out to save a WAV or MP3 file instead.
CLI’s immediate playback is the fastest terminal workflow for local realtime feedback. For API-level chunked streaming (
POST /v1/text-to-speech/stream), see Streaming TTS and the SDK docs.Options
| Flag | Description | Default |
|---|---|---|
--voice-id | Voice ID | tc_60e5426de8b95f1d3000d7b5 |
--model | Model (ssfm-v30, ssfm-v21) | ssfm-v30 |
--language | Language code (ISO 639-3) | auto-detected |
--emotion | Emotion type: smart, preset | |
--emotion-preset | Preset emotion (requires --emotion preset) | |
--emotion-intensity | Emotion intensity 0.0-2.0 (requires --emotion preset) | 1.0 |
--prev-text | Previous sentence for context (--emotion smart only) | |
--next-text | Next sentence for context (--emotion smart only) | |
--volume | Volume (0-200) | 100 |
--pitch | Pitch in semitones (-12 to +12) | 0 |
--tempo | Tempo multiplier (0.5-2.0) | 1.0 |
--format | Output format (wav, mp3) | wav |
--seed | Unsigned integer seed for reproducible output (>= 0) | |
--out | Save to file instead of playing | |
--timestamps-out | Save timestamp output to JSON, SRT, or WebVTT | |
--timestamps-format | Timestamp output format (json, srt, vtt) | inferred from --timestamps-out |
--granularity | Timestamp granularity (word, char, both) | server default |
Models
| Model | Languages | Emotions | Latency |
|---|---|---|---|
ssfm-v30 | 37 | 7 presets + smart emotion | Standard |
ssfm-v21 | 27 | 4 presets: normal, happy, sad, angry | Low |
Emotions
- Smart Emotion
- Preset Emotion
AI automatically infers the appropriate emotion from the text. Smart emotion is available with Provide surrounding sentences for better context:
ssfm-v30.