Skip to main content
The CLI can call Typecast Timestamp TTS and save alignment data alongside the generated audio. Use this when an agent needs subtitles for Shorts, caption timing for social video, karaoke-style highlights, or lip-sync metadata.

Generate subtitles

# Save audio and SRT subtitles
cast "Hello, world. This is a test." \
  --out hello.wav \
  --timestamps-out hello.srt

# Save audio and WebVTT subtitles
cast "Hello, world. This is a test." \
  --out hello.wav \
  --timestamps-out hello.vtt \
  --timestamps-format vtt
When --timestamps-format is omitted, CLI infers srt or vtt from the --timestamps-out extension and falls back to json.

Save raw timestamp JSON

cast "Hello, world. This is a test." \
  --out hello.wav \
  --timestamps-out hello.timestamps.json
JSON is useful when another tool will create captions, animate text, or align visuals manually.

Choose granularity

cast "Hello, world." \
  --out hello.wav \
  --timestamps-out hello.srt \
  --granularity both
For languages without whitespace between words, such as Japanese (jpn) or Chinese (zho), use character-level timestamps for usable subtitle timing:
cast "こんにちは。世界。" \
  --language jpn \
  --out hello.wav \
  --timestamps-out hello.srt

Caption workflow for agents

Create narration audio and captions from script.txt.
Use the CLI.
Write audio to ./video/voiceover.wav.
Write subtitles to ./video/voiceover.srt.
Keep the subtitle file next to the audio file.

Output choices

OutputUse when
.srtVideo editors, Shorts/Reels/TikTok caption import
.vttWeb video players and browser-based previews
.jsonCustom rendering, karaoke highlights, lip-sync, downstream automation
For social video, generate captions in the same step as audio. It keeps the final narration and subtitle timing tied to the exact same synthesis result.