The CLI can call Typecast Timestamp TTS and save alignment data alongside the generated audio. Use this when an agent needs subtitles for Shorts, caption timing for social video, karaoke-style highlights, or lip-sync metadata.
Generate subtitles
# Save audio and SRT subtitles
cast "Hello, world. This is a test." \
--out hello.wav \
--timestamps-out hello.srt
# Save audio and WebVTT subtitles
cast "Hello, world. This is a test." \
--out hello.wav \
--timestamps-out hello.vtt \
--timestamps-format vtt
When --timestamps-format is omitted, CLI infers srt or vtt from the --timestamps-out extension and falls back to json.
Save raw timestamp JSON
cast "Hello, world. This is a test." \
--out hello.wav \
--timestamps-out hello.timestamps.json
JSON is useful when another tool will create captions, animate text, or align visuals manually.
Choose granularity
cast "Hello, world." \
--out hello.wav \
--timestamps-out hello.srt \
--granularity both
For languages without whitespace between words, such as Japanese (jpn) or Chinese (zho), use character-level timestamps for usable subtitle timing:
cast "こんにちは。世界。" \
--language jpn \
--out hello.wav \
--timestamps-out hello.srt
Caption workflow for agents
Create narration audio and captions from script.txt.
Use the CLI.
Write audio to ./video/voiceover.wav.
Write subtitles to ./video/voiceover.srt.
Keep the subtitle file next to the audio file.
Output choices
| Output | Use when |
|---|
.srt | Video editors, Shorts/Reels/TikTok caption import |
.vtt | Web video players and browser-based previews |
.json | Custom rendering, karaoke highlights, lip-sync, downstream automation |
For social video, generate captions in the same step as audio. It keeps the final narration and subtitle timing tied to the exact same synthesis result.