The official Go library for the Typecast API . Convert text to lifelike speech using AI-powered voices.
Compatible with Go 1.21 and later versions. Zero external dependencies - uses only the Go standard library.
Go Reference Typecast Go SDK
Source Code Typecast Go SDK Source Code
Installation
go get github.com/neosapience/typecast-sdk/typecast-go
Latest registered version: typecast-go/v0.3.1 via Go modules. Make sure you have Go 1.21 or higher installed. Check your version with go version.
Quick Start
package main
import (
" context "
" os "
typecast " github.com/neosapience/typecast-sdk/typecast-go "
)
func main () {
// Initialize client
client := typecast . NewClient ( & typecast . ClientConfig {
APIKey : "YOUR_API_KEY" ,
})
ctx := context . Background ()
// Convert text to speech
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Text : "Hello there! I'm your friendly text-to-speech agent." ,
Model : typecast . ModelSSFMV30 ,
})
if err != nil {
panic ( err )
}
// Save audio file
os . WriteFile ( "output.wav" , response . AudioData , 0644 )
println ( "Audio saved! Format:" , string ( response . Format ))
}
Features
The Typecast Go SDK provides powerful features for text-to-speech conversion:
Multiple Voice Models : Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
Multi-language Support : 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
Emotion Control : Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
Audio Customization : Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
Voice Discovery : V2 Voices API with filtering by model, gender, age, and use cases
Instant Voice Cloning : Upload a WAV/MP3 sample and create a custom voice ID
Context Support : Full context.Context support for cancellation and timeouts
Timestamp TTS : Word- and character-level alignment data for subtitles, karaoke, and lip-sync
Zero Dependencies : Uses only the Go standard library
Streaming : Real-time chunked audio delivery for low-latency playback
Configuration
Set your API key via environment variable or pass directly:
import typecast " github.com/neosapience/typecast-sdk/typecast-go "
// Using environment variable (recommended)
// export TYPECAST_API_KEY="your-api-key-here"
client := typecast . NewClient ( nil )
// Or pass directly
client := typecast . NewClient ( & typecast . ClientConfig {
APIKey : "your-api-key-here" ,
})
// With custom settings
client := typecast . NewClient ( & typecast . ClientConfig {
APIKey : "your-api-key-here" ,
BaseURL : "https://api.typecast.ai" , // optional
Timeout : 60 * time . Second , // optional
})
Environment Variables
Variable Description TYPECAST_API_KEYYour Typecast API key TYPECAST_API_HOSTCustom API base URL (optional)
Advanced Usage
Emotion Control (ssfm-v30)
ssfm-v30 offers two emotion control modes: Preset and Smart .
Let the AI infer emotion from context: response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Text : "Everything is going to be okay." ,
Model : typecast . ModelSSFMV30 ,
Prompt : & typecast . SmartPrompt {
EmotionType : "smart" ,
PreviousText : "I just got the best news!" , // Optional context
NextText : "I can't wait to celebrate!" , // Optional context
},
})
Explicitly set emotion with preset values: intensity := 1.5
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Text : "I am so excited to show you these features!" ,
Model : typecast . ModelSSFMV30 ,
Prompt : & typecast . PresetPrompt {
EmotionType : "preset" ,
EmotionPreset : typecast . EmotionHappy , // normal, happy, sad, angry, whisper, toneup, tonedown
EmotionIntensity : & intensity , // Range: 0.0 to 2.0
},
})
Audio Customization
Control loudness, pitch, tempo, and output format:
lufs := - 14.0
pitch := 2
tempo := 1.2
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
Text : "Customized audio output!" ,
Model : typecast . ModelSSFMV30 ,
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Output : & typecast . Output {
TargetLUFS : & lufs , // Range: -70 to 0 (LUFS)
AudioPitch : & pitch , // Range: -12 to +12 semitones
AudioTempo : & tempo , // Range: 0.5x to 2.0x
AudioFormat : typecast . AudioFormatMP3 , // Options: wav, mp3
},
})
Voice Discovery (V2 API)
List and filter available voices with enhanced metadata:
// Get all voices
voices , err := client . GetVoicesV2 ( ctx , nil )
// Filter by criteria
voices , err := client . GetVoicesV2 ( ctx , & typecast . VoicesV2Filter {
Model : typecast . ModelSSFMV30 ,
Gender : typecast . GenderFemale ,
Age : typecast . AgeYoungAdult ,
})
// Display voice info
for _ , voice := range voices {
fmt . Printf ( "ID: %s , Name: %s \n " , voice . VoiceID , voice . VoiceName )
if voice . Gender != nil {
fmt . Printf ( "Gender: %s \n " , * voice . Gender )
}
if voice . Age != nil {
fmt . Printf ( "Age: %s \n " , * voice . Age )
}
for _ , model := range voice . Models {
fmt . Printf ( "Model: %s , Emotions: %v \n " , model . Version , model . Emotions )
}
}
// Get specific voice details
voice , err := client . GetVoiceV2 ( ctx , "tc_672c5f5ce59fac2a48faeaee" )
Multilingual Content
The SDK supports 37 languages with automatic language detection:
// Auto-detect language (recommended)
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Text : "こんにちは。お元気ですか。" ,
Model : typecast . ModelSSFMV30 ,
})
// Or specify language explicitly
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : "tc_672c5f5ce59fac2a48faeaee" ,
Text : "안녕하세요. 반갑습니다." ,
Model : typecast . ModelSSFMV30 ,
Language : "kor" , // ISO 639-3 language code
})
Streaming
Stream audio chunks in real-time for low-latency playback:
// Stream and extract raw PCM (skip 44-byte WAV header)
reader , _ := client . TextToSpeechStream ( context . Background (), request )
defer reader . Close ()
buf := make ([] byte , 4096 )
first := true
for {
n , err := reader . Read ( buf )
if n > 0 {
data := buf [: n ]
if first {
data = data [ 44 :] // Skip WAV header
first = false
}
// data is raw 16-bit mono PCM at 32000 Hz
// Feed to your audio output (e.g. oto, portaudio)
_ = data
}
if err != nil {
break
}
}
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable.
Timestamp TTS
TextToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.
Basic Usage
package main
import (
" context "
" fmt "
" os "
typecast " github.com/neosapience/typecast-sdk/typecast-go "
)
func main () {
client := typecast . NewClient ( & typecast . ClientConfig { APIKey : "YOUR_API_KEY" })
ctx := context . Background ()
result , err := client . TextToSpeechWithTimestamps ( ctx , & typecast . TTSRequestWithTimestamps {
VoiceID : "tc_60e5426de8b95f1d3000d7b5" ,
Text : "Hello. How are you?" ,
Model : typecast . ModelSSFMV30 ,
})
if err != nil {
panic ( err )
}
os . WriteFile ( "output.wav" , result . AudioBytes (), 0644 )
fmt . Printf ( "Duration: %.3f s \n " , result . AudioDuration )
for _ , w := range result . Words {
fmt . Printf ( " [ %.3f s – %.3f s] %s \n " , w . StartTime , w . EndTime , w . Text )
}
}
Granularity
Pass Granularity: typecast.GranularityWord (default) or Granularity: typecast.GranularityChar to control the alignment unit.
// Character-level alignment — required for Japanese / Chinese
result , err := client . TextToSpeechWithTimestamps ( ctx , & typecast . TTSRequestWithTimestamps {
VoiceID : "tc_60e5426de8b95f1d3000d7b5" ,
Text : "Hello. How are you?" ,
Model : typecast . ModelSSFMV30 ,
Granularity : typecast . GranularityChar ,
})
Subtitle Export
srt , _ := result . ToSrt ()
os . WriteFile ( "output.srt" , [] byte ( srt ), 0644 )
vtt , _ := result . ToVtt ()
os . WriteFile ( "output.vtt" , [] byte ( vtt ), 0644 )
Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use GranularityChar for these languages to get character-level alignment.
Instant Voice Cloning
Clone a custom voice from a short audio sample, then pass the returned uc_ voice ID directly to TTS.
package main
import (
" context "
" os "
typecast " github.com/neosapience/typecast-sdk/typecast-go "
)
func main () {
client := typecast . NewClient ( & typecast . ClientConfig { APIKey : "YOUR_API_KEY" })
ctx := context . Background ()
audioBytes , err := os . ReadFile ( "sample.wav" )
if err != nil {
panic ( err )
}
voice , err := client . CloneVoice ( ctx , audioBytes , "sample.wav" , "MyVoice" , "ssfm-v30" )
if err != nil {
panic ( err )
}
response , err := client . TextToSpeech ( ctx , & typecast . TTSRequest {
VoiceID : voice . VoiceID ,
Text : "Hello from my cloned voice!" ,
Model : typecast . ModelSSFMV30 ,
})
if err != nil {
panic ( err )
}
os . WriteFile ( "output.wav" , response . AudioData , 0644 )
client . DeleteVoice ( ctx , voice . VoiceID )
}
Voice cloning audio must be 25 MB or smaller , the audio duration must be 5-150 seconds , and the custom voice name must be 1-30 characters .
Supported Languages
The SDK supports 37 languages with automatic language detection:
Code Language Code Language Code Language engEnglish jpnJapanese ukrUkrainian korKorean ellGreek indIndonesian spaSpanish tamTamil danDanish deuGerman tglTagalog sweSwedish fraFrench finFinnish msaMalay itaItalian zhoChinese cesCzech polPolish slkSlovak porPortuguese nldDutch araArabic bulBulgarian rusRussian hrvCroatian ronRomanian benBengali hinHindi hunHungarian nanHokkien norNorwegian panPunjabi thaThai turTurkish vieVietnamese yueCantonese
If not specified, the language will be automatically detected from the input text.
Error Handling
The SDK provides an APIError type with helper methods for handling specific errors:
import typecast " github.com/neosapience/typecast-sdk/typecast-go "
response , err := client . TextToSpeech ( ctx , request )
if err != nil {
if apiErr , ok := err .( * typecast . APIError ); ok {
fmt . Printf ( "Error %d : %s \n " , apiErr . StatusCode , apiErr . Message )
// Handle specific errors
switch {
case apiErr . IsUnauthorized ():
// 401: Invalid API key
case apiErr . IsForbidden ():
// 403: Access denied
case apiErr . IsPaymentRequired ():
// 402: Insufficient credits
case apiErr . IsNotFound ():
// 404: Resource not found
case apiErr . IsValidationError ():
// 422: Validation error
case apiErr . IsRateLimited ():
// 429: Rate limit exceeded
case apiErr . IsServerError ():
// 5xx: Server error
case apiErr . IsBadRequest ():
// 400: Bad request
}
}
}
Error Types
Method Status Code Description IsBadRequest()400 Invalid request parameters IsUnauthorized()401 Invalid or missing API key IsPaymentRequired()402 Insufficient credits IsForbidden()403 Access denied IsNotFound()404 Resource not found IsValidationError()422 Validation error IsRateLimited()429 Rate limit exceeded IsServerError()5xx Server error
Context and Timeouts
The SDK fully supports Go’s context.Context for cancellation and timeouts:
// With timeout
ctx , cancel := context . WithTimeout ( context . Background (), 30 * time . Second )
defer cancel ()
response , err := client . TextToSpeech ( ctx , request )
if err != nil {
if ctx . Err () == context . DeadlineExceeded {
fmt . Println ( "Request timed out" )
}
}
// With cancellation
ctx , cancel := context . WithCancel ( context . Background ())
go func () {
time . Sleep ( 5 * time . Second )
cancel () // Cancel after 5 seconds
}()
response , err := client . TextToSpeech ( ctx , request )
API Reference
Client Methods
Method Description TextToSpeech(ctx, request)Convert text to speech audio CloneVoice(ctx, audio, filename, name, model)Create a custom voice via instant cloning DeleteVoice(ctx, voiceID)Delete a custom cloned voice GetVoicesV2(ctx, filter)Get available voices with filtering GetVoiceV2(ctx, voiceID)Get a specific voice by ID GetVoices(ctx, model)Get voices (V1 API, deprecated) GetVoice(ctx, voiceID, model)Get voice (V1 API, deprecated)
TTSRequest Fields
Field Type Required Description VoiceIDstring✓ Voice ID (format: tc_* or uc_*) Textstring✓ Text to synthesize (max 2000 chars) ModelTTSModel✓ TTS model (ModelSSFMV21 or ModelSSFMV30) LanguagestringISO 639-3 code (auto-detected if omitted) Prompt*Prompt / *PresetPrompt / *SmartPromptEmotion settings Output*OutputAudio output settings Seed*uint32Unsigned integer seed for reproducibility (≥ 0)
TTSResponse Fields
Field Type Description AudioData[]byteGenerated audio data Durationfloat64Audio duration in seconds FormatAudioFormatAudio format (wav or mp3)
Constants
Models
Constant Value Description ModelSSFMV30ssfm-v30Latest model with improved prosody ModelSSFMV21ssfm-v21Stable production model
Emotion Presets
Constant ssfm-v21 ssfm-v30 EmotionNormal✓ ✓ EmotionHappy✓ ✓ EmotionSad✓ ✓ EmotionAngry✓ ✓ EmotionWhisper✗ ✓ EmotionToneUp✗ ✓ EmotionToneDown✗ ✓
Constant Value Description AudioFormatWAVwavUncompressed PCM audio AudioFormatMP3mp3Compressed MP3 audio