Skip to main content

Documentation Index

Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Package

Typecast Python SDK

Source Code

Typecast Python SDK Source Code

Installation

Install the Typecast Python SDK using pip:
pip install --upgrade typecast-python
The package is installed as typecast-python, but imported as typecast.
Make sure you have version 0.3.0 or higher installed. You can check your version with pip show typecast-python. If you have an older version, run pip install --upgrade typecast-python to update.

Quick Start

Here’s a simple example to convert text to speech:
from typecast import Typecast
from typecast.models import TTSRequest

# Initialize client
client = Typecast(api_key="YOUR_API_KEY")

# Convert text to speech
response = client.text_to_speech(TTSRequest(
    text="Hello there! I'm your friendly text-to-speech agent.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee"
))

# Save audio file
with open('output.wav', 'wb') as f:
    f.write(response.audio_data)

print(f"Duration: {response.duration}s, Format: {response.format}")

Features

The Typecast Python SDK provides powerful features for text-to-speech conversion:
  • Multiple Voice Models: Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
  • Multi-language Support: 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
  • Emotion Control: Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
  • Audio Customization: Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
  • Async Support: Built-in async client for high-performance applications
  • Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
  • Type Hints: Full type annotations with Pydantic models
  • Timestamp TTS: Word- and character-level alignment data for subtitles, karaoke, and lip-sync
  • Streaming: Real-time chunked audio delivery for low-latency playback

Configuration

You can configure the API key using environment variables or pass it directly to the client:
export TYPECAST_API_KEY="your-api-key-here"

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.
Let the AI infer emotion from context:
from typecast import Typecast
from typecast.models import TTSRequest, SmartPrompt

client = Typecast()

response = client.text_to_speech(TTSRequest(
    text="Everything is going to be okay.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    prompt=SmartPrompt(
        emotion_type="smart",
        previous_text="I just got the best news!",  # Optional context
        next_text="I can't wait to celebrate!"      # Optional context
    )
))

Audio Customization

Control loudness, pitch, tempo, and output format:
from typecast import Typecast
from typecast.models import TTSRequest, Output

client = Typecast()

response = client.text_to_speech(TTSRequest(
    text="Customized audio output!",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    output=Output(
        target_lufs=-14.0,   # Range: -70 to 0 (LUFS)
        audio_pitch=2,       # Range: -12 to +12 semitones
        audio_tempo=1.2,     # Range: 0.5x to 2.0x
        audio_format="mp3"   # Options: wav, mp3
    ),
    seed=42                  # Unsigned seed for reproducible results
))

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:
from typecast import Typecast
from typecast.models import VoicesV2Filter, TTSModel, GenderEnum, AgeEnum

client = Typecast()

# Get all voices
voices = client.voices_v2()

# Filter by criteria
filtered = client.voices_v2(VoicesV2Filter(
    model=TTSModel.SSFM_V30,
    gender=GenderEnum.FEMALE,
    age=AgeEnum.YOUNG_ADULT
))

# Display voice info
for voice in voices:
    print(f"ID: {voice.voice_id}, Name: {voice.voice_name}")
    print(f"Gender: {voice.gender}, Age: {voice.age}")
    print(f"Models: {', '.join(m.version.value for m in voice.models)}")
    print(f"Use cases: {voice.use_cases}")

Async Client

For high-performance applications, use the async client:
import asyncio
from typecast import AsyncTypecast
from typecast.models import TTSRequest

async def main():
    async with AsyncTypecast() as client:
        response = await client.text_to_speech(TTSRequest(
            text="Hello from async!",
            model="ssfm-v30",
            voice_id="tc_672c5f5ce59fac2a48faeaee"
        ))

        with open('async_output.wav', 'wb') as f:
            f.write(response.audio_data)

asyncio.run(main())

Streaming

Stream audio chunks in real-time for low-latency playback:
# pip install requests sounddevice
import sounddevice as sd
from typecast import Typecast
from typecast.models import TTSRequestStream, OutputStream

client = Typecast()

request = TTSRequestStream(
    text="Stream this text as audio in real time.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    output=OutputStream(audio_format="wav")
)

with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16") as player:
    buf, first = bytearray(), True
    for chunk in client.text_to_speech_stream(request):
        if first:
            chunk = chunk[44:]  # Skip 44-byte WAV header
            first = False
        buf.extend(chunk)
        n = len(buf) - (len(buf) % 2)  # int16 alignment
        if n:
            player.write(bytes(buf[:n]))
            del buf[:n]
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. The streaming endpoint does not support volume or target_lufs.

Timestamp TTS

text_to_speech_with_timestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.

Basic Usage

from typecast import Typecast
from typecast.models import TTSRequestWithTimestamps

client = Typecast(api_key="YOUR_API_KEY")

response = client.text_to_speech_with_timestamps(TTSRequestWithTimestamps(
    text="Hello. How are you?",
    model="ssfm-v30",
    voice_id="tc_60e5426de8b95f1d3000d7b5",
))

# Save audio
with open("output.wav", "wb") as f:
    f.write(response.audio_bytes())

print(f"Duration: {response.audio_duration}s")
for word in response.words:
    print(f"  [{word.start_time:.3f}s – {word.end_time:.3f}s] {word.text}")

Granularity

Pass granularity="word" (default) or granularity="char" to control the alignment unit.
# Character-level alignment — required for Japanese / Chinese
response = client.text_to_speech_with_timestamps(TTSRequestWithTimestamps(
    text="Hello. How are you?",
    model="ssfm-v30",
    voice_id="tc_60e5426de8b95f1d3000d7b5",
    granularity="char",
))

for char in response.characters:
    print(f"  [{char.start_time:.3f}s – {char.end_time:.3f}s] {char.text}")

Subtitle Export

The response object includes helpers that convert alignment data to SRT or WebVTT captions. Captions are split on sentence terminators (. ? ! 。 ? !) and capped at 7 seconds / 42 characters per cue (BBC/Netflix subtitle guidelines).
# Export SRT captions
srt_text = response.to_srt()
with open("output.srt", "w", encoding="utf-8") as f:
    f.write(srt_text)

# Export WebVTT captions
vtt_text = response.to_vtt()
with open("output.vtt", "w", encoding="utf-8") as f:
    f.write(vtt_text)

Save Audio Helper

# Equivalent to writing audio_bytes() to a file
response.save_audio("output.wav")
Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use granularity="char" for these languages to get character-level alignment.

Supported Languages

Recommended: Use the LanguageCode enum for type-safe language selection. You can also pass the ISO 639-3 code as a string (e.g., "eng"). The SDK supports 37 languages with ISO 639-3 codes:
LanguageCodeLanguageCodeLanguageCode
EnglishengJapanesejpnUkrainianukr
KoreankorGreekellIndonesianind
SpanishspaTamiltamDanishdan
GermandeuTagalogtglSwedishswe
FrenchfraFinnishfinMalaymsa
ItalianitaChinesezhoCzechces
PolishpolSlovakslkPortuguesepor
DutchnldArabicaraBulgarianbul
RussianrusCroatianhrvRomanianron
BengalibenHindihinHungarianhun
HokkiennanNorwegiannorPunjabipan
ThaithaTurkishturVietnamesevie
Cantoneseyue
Use the LanguageCode enum for type-safe language selection:
from typecast.models import TTSRequest, LanguageCode

response = client.text_to_speech(TTSRequest(
    text="Hello",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    language=LanguageCode.ENG
))

Error Handling

The SDK provides specific exceptions for different HTTP status codes:
from typecast import (
    Typecast,
    TypecastError,
    BadRequestError,
    UnauthorizedError,
    PaymentRequiredError,
    NotFoundError,
    UnprocessableEntityError,
    RateLimitError,
    InternalServerError,
)

try:
    response = client.text_to_speech(request)
except UnauthorizedError:
    print("Invalid API key")
except PaymentRequiredError:
    print("Insufficient credits")
except RateLimitError:
    print("Rate limit exceeded - please retry later")
except TypecastError as e:
    print(f"Error {e.status_code}: {e.message}")
ExceptionStatus CodeDescription
BadRequestError400Invalid request parameters
UnauthorizedError401Invalid or missing API key
PaymentRequiredError402Insufficient credits
NotFoundError404Resource not found
UnprocessableEntityError422Validation error
RateLimitError429Rate limit exceeded
InternalServerError500Server error