Use this file to discover all available pages before exploring further.
The official Go library for the Typecast API. Convert text to lifelike speech using AI-powered voices.Compatible with Go 1.21 and later versions. Zero external dependencies - uses only the Go standard library.
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{ VoiceID: "tc_672c5f5ce59fac2a48faeaee", Text: "Everything is going to be okay.", Model: typecast.ModelSSFMV30, Prompt: &typecast.SmartPrompt{ EmotionType: "smart", PreviousText: "I just got the best news!", // Optional context NextText: "I can't wait to celebrate!", // Optional context },})
Explicitly set emotion with preset values:
intensity := 1.5response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{ VoiceID: "tc_672c5f5ce59fac2a48faeaee", Text: "I am so excited to show you these features!", Model: typecast.ModelSSFMV30, Prompt: &typecast.PresetPrompt{ EmotionType: "preset", EmotionPreset: typecast.EmotionHappy, // normal, happy, sad, angry, whisper, toneup, tonedown EmotionIntensity: &intensity, // Range: 0.0 to 2.0 },})
Stream audio chunks in real-time for low-latency playback:
// Stream and extract raw PCM (skip 44-byte WAV header)reader, _ := client.TextToSpeechStream(context.Background(), request)defer reader.Close()buf := make([]byte, 4096)first := truefor { n, err := reader.Read(buf) if n > 0 { data := buf[:n] if first { data = data[44:] // Skip WAV header first = false } // data is raw 16-bit mono PCM at 32000 Hz // Feed to your audio output (e.g. oto, portaudio) _ = data } if err != nil { break }}
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. The streaming endpoint does not support Volume or TargetLUFS.
TextToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.
Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use GranularityChar for these languages to get character-level alignment.
The SDK provides an APIError type with helper methods for handling specific errors:
import typecast "github.com/neosapience/typecast-sdk/typecast-go"response, err := client.TextToSpeech(ctx, request)if err != nil { if apiErr, ok := err.(*typecast.APIError); ok { fmt.Printf("Error %d: %s\n", apiErr.StatusCode, apiErr.Message) // Handle specific errors switch { case apiErr.IsUnauthorized(): // 401: Invalid API key case apiErr.IsForbidden(): // 403: Access denied case apiErr.IsPaymentRequired(): // 402: Insufficient credits case apiErr.IsNotFound(): // 404: Resource not found case apiErr.IsValidationError(): // 422: Validation error case apiErr.IsRateLimited(): // 429: Rate limit exceeded case apiErr.IsServerError(): // 5xx: Server error case apiErr.IsBadRequest(): // 400: Bad request } }}