Documentation Index
Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
New Endpoint: POST /v1/text-to-speech/with-timestamps
Returns the synthesized audio together with word- and character-level alignment data in a single response — ideal for auto-subtitling, karaoke highlights, and lip-sync animations.POST /v1/text-to-speech/with-timestamps
Request Schema:{
"voice_id": "tc_60e5426de8b95f1d3000d7b5",
"text": "Hello.",
"model": "ssfm-v30"
}
Response Schema (summary):{
"audio": "<base64>",
"audio_format": "wav",
"audio_duration": 0.52,
"words": [
{ "text": "Hello.", "start": 0.0, "end": 0.52 }
],
"characters": [
{ "text": "H", "start": 0.0, "end": 0.08 },
{ "text": "e", "start": 0.08, "end": 0.18 },
{ "text": "l", "start": 0.18, "end": 0.28 },
{ "text": "l", "start": 0.28, "end": 0.36 },
{ "text": "o", "start": 0.36, "end": 0.48 },
{ "text": ".", "start": 0.48, "end": 0.52 }
]
}
granularity parameter:granularity is optional. If omitted, the API returns both word- and character-level alignment in a single response.| Value | Description |
|---|
word | Per-word alignment. Recommended for all languages with whitespace. |
char | Per-character alignment. Required for Japanese (jpn) and Chinese (zho). |
Captioning rules: Captions are split on sentence terminators (. ? ! 。 ? !) with a 7 s / 42-character hard cap per cue (BBC/Netflix subtitle guidelines).SDK Updates — Timestamp TTS added to all 11 SDKs
| SDK | Version | Method |
|---|
| Python | 0.3.0 | text_to_speech_with_timestamps() |
| JavaScript | 0.4.0 | textToSpeechWithTimestamps() |
| Go | v0.3.0 | TextToSpeechWithTimestamps() |
| Rust | 0.3.0 | text_to_speech_with_timestamps() |
| Swift | v0.3.0 | textToSpeechWithTimestamps() |
| C# | 0.3.0 | TextToSpeechWithTimestampsAsync() |
| Java | 1.2.0 | textToSpeechWithTimestamps() |
| Kotlin | 1.2.0 | textToSpeechWithTimestamps() |
| C | 1.2.0 | typecast_text_to_speech_with_timestamps() |
| Zig | v0.2.0 | textToSpeechWithTimestamps() |
| PHP | v0.1.0 | textToSpeechWithTimestamps() |
All SDK response objects include toSrt() / toVtt() subtitle export helpers and a saveAudio(path) / audio_bytes() convenience method.
New Endpoint: POST /v1/text-to-speech/stream
Added a low-latency streaming endpoint that delivers audio chunks as they are generated, enabling real-time playback without waiting for full synthesis.POST /v1/text-to-speech/stream
Key Differences from /v1/text-to-speech:| Feature | Standard | Streaming |
|---|
| Response | Complete audio file | Chunked audio stream |
| Latency | Wait for full synthesis | First chunk in ~200ms |
volume / target_lufs | Supported | Not supported |
| Output settings | Output | OutputStream (pitch, tempo, format only) |
Request Schema:{
"voice_id": "tc_xxxxx",
"text": "Thanks for reaching out. Your reservation has been confirmed for Friday at 7 PM.",
"model": "ssfm-v30",
"language": "eng",
"output": {
"audio_pitch": 0,
"audio_tempo": 1.0,
"audio_format": "wav"
}
}
Response: Chunked binary stream (audio/wav or audio/mpeg).New Endpoint: GET /v1/users/me/subscription
Retrieve the authenticated user’s plan tier, credit usage, and concurrency limits.GET /v1/users/me/subscription
Response Schema:{
"plan": "lite",
"credits": {
"plan_credits": 200000,
"used_credits": 157300
},
"limits": {
"concurrency_limit": 5
}
}
SDK Updates
All 9 official SDKs have been updated with streaming and subscription support:| SDK | Version | Streaming Method |
|---|
| Python | 0.2.0 | text_to_speech_stream() (sync + async) |
| JavaScript | 0.3.0 | textToSpeechStream() → ReadableStream |
| Go | v0.2.0 | TextToSpeechStream() → io.ReadCloser |
| Rust | 0.2.0 | text_to_speech_stream() → Stream<Bytes> |
| Swift | v0.2.0 | textToSpeechStream() → AsyncThrowingStream |
| C# | 0.2.0 | TextToSpeechStreamAsync() → Stream |
| Java | 1.1.0 | textToSpeechStream() → InputStream |
| Kotlin | 1.1.0 | textToSpeechStream() → InputStream |
| C | 1.1.0 | typecast_text_to_speech_stream() (callback) |
New Model: ssfm-v30
Added support for the new ssfm-v30 model with improved speech quality and expanded capabilities.New Features:
- Smart Emotion - Context-aware emotion inference using
SmartPrompt
- 7 Emotion Presets - Added
whisper, toneup, tonedown presets
- Universal Emotion Support - All emotions available across all voices
- 37 Languages - Added 10 new languages
New Languages: Bengali, Cantonese, Hindi, Hungarian, Min Nan, Norwegian, Punjabi, Thai, Turkish, VietnameseRequest Schema Changes:// ssfm-v30 with SmartPrompt (context-aware emotion)
{
"model": "ssfm-v30",
"prompt": {
"emotion_type": "smart",
"previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
"next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}
}
// ssfm-v30 with PresetPrompt (manual emotion selection)
{
"model": "ssfm-v30",
"prompt": {
"emotion_type": "preset",
"emotion_preset": "happy",
"emotion_intensity": 1.0
}
}
New Endpoint: GET /v2/voices
Added enhanced voice listing endpoint with model-grouped emotions and additional metadata.Query Parameters:| Parameter | Type | Description |
|---|
model | string | Filter by model (ssfm-v21, ssfm-v30) |
gender | string | Filter by gender (male, female) |
age | string | Filter by age group (child, teenager, young_adult, middle_age, elder) |
use_cases | string | Filter by use case (Audiobook, Game, E-learning, etc.) |
Response Schema:[
{
"voice_id": "tc_xxxxx",
"voice_name": "Voice Name",
"models": [
{
"version": "ssfm-v30",
"emotions": ["normal", "happy", "sad", "angry", "whisper", "toneup", "tonedown"]
},
{
"version": "ssfm-v21",
"emotions": ["normal", "happy", "sad"]
}
],
"gender": "female",
"age": "young_adult",
"use_cases": ["Audiobook", "E-learning"]
}
]
Deprecated: Voice Management Endpoints
The following endpoints have been deprecated and removed:| Endpoint | Status |
|---|
POST /v1/voices | Removed |
GET /v1/voices/{voice_id} | Removed |
Use GET /v2/voices for listing voices with enhanced metadata.
Initial Release: ssfm-v21
Launched the Typecast Text-to-Speech API with the ssfm-v21 model.Endpoints:| Method | Endpoint | Description |
|---|
| POST | /v1/text-to-speech | Generate speech from text |
| GET | /v1/voices | List available voices |
Features:
- Low latency speech synthesis
- 4 Emotion presets:
normal, happy, sad, angry
- Emotion availability varies by voice
- 27 languages supported
Supported Languages: English, Korean, Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Malay, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, UkrainianRequest Schema:{
"voice_id": "tc_xxxxx",
"text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
"model": "ssfm-v21",
"language": "eng",
"prompt": {
"emotion_preset": "normal",
"emotion_intensity": 1.0
},
"output": {
"volume": 100,
"audio_pitch": 0,
"audio_tempo": 1.0,
"audio_format": "wav"
}
}