The package is installed as typecast-python, but imported as typecast.
Make sure you have version 0.1.5 or higher installed. You can check your version with pip show typecast-python. If you have an older version, run pip install --upgrade typecast-python to update.
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
from typecast import Typecastfrom typecast.models import TTSRequest, SmartPromptclient = Typecast()response = client.text_to_speech(TTSRequest( text="Everything is going to be okay.", model="ssfm-v30", voice_id="tc_672c5f5ce59fac2a48faeaee", prompt=SmartPrompt( emotion_type="smart", previous_text="I just got the best news!", # Optional context next_text="I can't wait to celebrate!" # Optional context )))
Explicitly set emotion with preset values:
from typecast import Typecastfrom typecast.models import TTSRequest, PresetPromptclient = Typecast()response = client.text_to_speech(TTSRequest( text="I am so excited to show you these features!", model="ssfm-v30", voice_id="tc_672c5f5ce59fac2a48faeaee", prompt=PresetPrompt( emotion_type="preset", emotion_preset="happy", # normal, happy, sad, angry, whisper, toneup, tonedown emotion_intensity=1.5 # Range: 0.0 to 2.0 )))
Stream audio chunks in real-time for low-latency playback:
# pip install requests sounddeviceimport sounddevice as sdfrom typecast import Typecastfrom typecast.models import TTSRequestStream, OutputStreamclient = Typecast()request = TTSRequestStream( text="Stream this text as audio in real time.", model="ssfm-v30", voice_id="tc_672c5f5ce59fac2a48faeaee", output=OutputStream(audio_format="wav"))with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16") as player: buf, first = bytearray(), True for chunk in client.text_to_speech_stream(request): if first: chunk = chunk[44:] # Skip 44-byte WAV header first = False buf.extend(chunk) n = len(buf) - (len(buf) % 2) # int16 alignment if n: player.write(bytes(buf[:n])) del buf[:n]
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. The streaming endpoint does not support volume or target_lufs.
Recommended: Use the LanguageCode enum for type-safe language selection. You can also pass the ISO 639-3 code as a string (e.g., "eng").The SDK supports 37 languages with ISO 639-3 codes:
Language
Code
Language
Code
Language
Code
English
eng
Japanese
jpn
Ukrainian
ukr
Korean
kor
Greek
ell
Indonesian
ind
Spanish
spa
Tamil
tam
Danish
dan
German
deu
Tagalog
tgl
Swedish
swe
French
fra
Finnish
fin
Malay
msa
Italian
ita
Chinese
zho
Czech
ces
Polish
pol
Slovak
slk
Portuguese
por
Dutch
nld
Arabic
ara
Bulgarian
bul
Russian
rus
Croatian
hrv
Romanian
ron
Bengali
ben
Hindi
hin
Hungarian
hun
Hokkien
nan
Norwegian
nor
Punjabi
pan
Thai
tha
Turkish
tur
Vietnamese
vie
Cantonese
yue
Use the LanguageCode enum for type-safe language selection: