Go - Typecast Documentation

The official Go library for the Typecast API. Convert text to lifelike speech using AI-powered voices. Compatible with Go 1.21 and later versions. Zero external dependencies - uses only the Go standard library.

Go Reference

Typecast Go SDK

Source Code

Typecast Go SDK Source Code

Installation

go get github.com/neosapience/typecast-sdk/typecast-go

Make sure you have Go 1.21 or higher installed. Check your version with go version.

Quick Start

package main

import (
    "context"
    "os"

    typecast "github.com/neosapience/typecast-sdk/typecast-go"
)

func main() {
    // Initialize client
    client := typecast.NewClient(&typecast.ClientConfig{
        APIKey: "YOUR_API_KEY",
    })

    ctx := context.Background()

    // Convert text to speech
    response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
        VoiceID: "tc_672c5f5ce59fac2a48faeaee",
        Text:    "Hello there! I'm your friendly text-to-speech agent.",
        Model:   typecast.ModelSSFMV30,
    })
    if err != nil {
        panic(err)
    }

    // Save audio file
    os.WriteFile("output.wav", response.AudioData, 0644)
    
    println("Audio saved! Format:", string(response.Format))
}

Features

The Typecast Go SDK provides powerful features for text-to-speech conversion:

Multiple Voice Models: Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
Multi-language Support: 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
Emotion Control: Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
Audio Customization: Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
Context Support: Full context.Context support for cancellation and timeouts
Timestamp TTS: Word- and character-level alignment data for subtitles, karaoke, and lip-sync
Zero Dependencies: Uses only the Go standard library
Streaming: Real-time chunked audio delivery for low-latency playback

Configuration

Set your API key via environment variable or pass directly:

import typecast "github.com/neosapience/typecast-sdk/typecast-go"

// Using environment variable (recommended)
// export TYPECAST_API_KEY="your-api-key-here"
client := typecast.NewClient(nil)

// Or pass directly
client := typecast.NewClient(&typecast.ClientConfig{
    APIKey: "your-api-key-here",
})

// With custom settings
client := typecast.NewClient(&typecast.ClientConfig{
    APIKey:  "your-api-key-here",
    BaseURL: "https://api.typecast.ai",  // optional
    Timeout: 60 * time.Second,           // optional
})

Environment Variables

Variable	Description
`TYPECAST_API_KEY`	Your Typecast API key
`TYPECAST_API_HOST`	Custom API base URL (optional)

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.

Smart Mode
Preset Mode

Let the AI infer emotion from context:

response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "Everything is going to be okay.",
    Model:   typecast.ModelSSFMV30,
    Prompt: &typecast.SmartPrompt{
        EmotionType:  "smart",
        PreviousText: "I just got the best news!",  // Optional context
        NextText:     "I can't wait to celebrate!", // Optional context
    },
})

Explicitly set emotion with preset values:

intensity := 1.5

response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "I am so excited to show you these features!",
    Model:   typecast.ModelSSFMV30,
    Prompt: &typecast.PresetPrompt{
        EmotionType:      "preset",
        EmotionPreset:    typecast.EmotionHappy,  // normal, happy, sad, angry, whisper, toneup, tonedown
        EmotionIntensity: &intensity,             // Range: 0.0 to 2.0
    },
})

Audio Customization

Control loudness, pitch, tempo, and output format:

lufs := -14.0
pitch := 2
tempo := 1.2
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    Text:    "Customized audio output!",
    Model:   typecast.ModelSSFMV30,
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Output: &typecast.Output{
        TargetLUFS:  &lufs,                   // Range: -70 to 0 (LUFS)
        AudioPitch:  &pitch,                  // Range: -12 to +12 semitones
        AudioTempo:  &tempo,                  // Range: 0.5x to 2.0x
        AudioFormat: typecast.AudioFormatMP3, // Options: wav, mp3
    },
})

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:

// Get all voices
voices, err := client.GetVoicesV2(ctx, nil)

// Filter by criteria
voices, err := client.GetVoicesV2(ctx, &typecast.VoicesV2Filter{
    Model:  typecast.ModelSSFMV30,
    Gender: typecast.GenderFemale,
    Age:    typecast.AgeYoungAdult,
})

// Display voice info
for _, voice := range voices {
    fmt.Printf("ID: %s, Name: %s\n", voice.VoiceID, voice.VoiceName)
    
    if voice.Gender != nil {
        fmt.Printf("Gender: %s\n", *voice.Gender)
    }
    if voice.Age != nil {
        fmt.Printf("Age: %s\n", *voice.Age)
    }
    
    for _, model := range voice.Models {
        fmt.Printf("Model: %s, Emotions: %v\n", model.Version, model.Emotions)
    }
}

// Get specific voice details
voice, err := client.GetVoiceV2(ctx, "tc_672c5f5ce59fac2a48faeaee")

Multilingual Content

The SDK supports 37 languages with automatic language detection:

// Auto-detect language (recommended)
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "こんにちは。お元気ですか。",
    Model:   typecast.ModelSSFMV30,
})

// Or specify language explicitly
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID:  "tc_672c5f5ce59fac2a48faeaee",
    Text:     "안녕하세요. 반갑습니다.",
    Model:    typecast.ModelSSFMV30,
    Language: "kor",  // ISO 639-3 language code
})

Streaming

Stream audio chunks in real-time for low-latency playback:

// Stream and extract raw PCM (skip 44-byte WAV header)
reader, _ := client.TextToSpeechStream(context.Background(), request)
defer reader.Close()

buf := make([]byte, 4096)
first := true
for {
    n, err := reader.Read(buf)
    if n > 0 {
        data := buf[:n]
        if first {
            data = data[44:]  // Skip WAV header
            first = false
        }
        // data is raw 16-bit mono PCM at 32000 Hz
        // Feed to your audio output (e.g. oto, portaudio)
        _ = data
    }
    if err != nil {
        break
    }
}

WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. The streaming endpoint does not support Volume or TargetLUFS.

Timestamp TTS

TextToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.

Basic Usage

package main

import (
    "context"
    "fmt"
    "os"

    typecast "github.com/neosapience/typecast-sdk/typecast-go"
)

func main() {
    client := typecast.NewClient(&typecast.ClientConfig{APIKey: "YOUR_API_KEY"})
    ctx := context.Background()

    result, err := client.TextToSpeechWithTimestamps(ctx, &typecast.TTSRequestWithTimestamps{
        VoiceID: "tc_60e5426de8b95f1d3000d7b5",
        Text:    "Hello. How are you?",
        Model:   typecast.ModelSSFMV30,
    })
    if err != nil {
        panic(err)
    }

    os.WriteFile("output.wav", result.AudioBytes(), 0644)
    fmt.Printf("Duration: %.3fs\n", result.AudioDuration)

    for _, w := range result.Words {
        fmt.Printf("  [%.3fs – %.3fs] %s\n", w.StartTime, w.EndTime, w.Text)
    }
}

Granularity

Pass Granularity: typecast.GranularityWord (default) or Granularity: typecast.GranularityChar to control the alignment unit.

// Character-level alignment — required for Japanese / Chinese
result, err := client.TextToSpeechWithTimestamps(ctx, &typecast.TTSRequestWithTimestamps{
    VoiceID:     "tc_60e5426de8b95f1d3000d7b5",
    Text:        "Hello. How are you?",
    Model:       typecast.ModelSSFMV30,
    Granularity: typecast.GranularityChar,
})

Subtitle Export

srt, _ := result.ToSrt()
os.WriteFile("output.srt", []byte(srt), 0644)

vtt, _ := result.ToVtt()
os.WriteFile("output.vtt", []byte(vtt), 0644)

Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use GranularityChar for these languages to get character-level alignment.

Supported Languages

The SDK supports 37 languages with automatic language detection:

Code	Language	Code	Language	Code	Language
`eng`	English	`jpn`	Japanese	`ukr`	Ukrainian
`kor`	Korean	`ell`	Greek	`ind`	Indonesian
`spa`	Spanish	`tam`	Tamil	`dan`	Danish
`deu`	German	`tgl`	Tagalog	`swe`	Swedish
`fra`	French	`fin`	Finnish	`msa`	Malay
`ita`	Italian	`zho`	Chinese	`ces`	Czech
`pol`	Polish	`slk`	Slovak	`por`	Portuguese
`nld`	Dutch	`ara`	Arabic	`bul`	Bulgarian
`rus`	Russian	`hrv`	Croatian	`ron`	Romanian
`ben`	Bengali	`hin`	Hindi	`hun`	Hungarian
`nan`	Hokkien	`nor`	Norwegian	`pan`	Punjabi
`tha`	Thai	`tur`	Turkish	`vie`	Vietnamese
`yue`	Cantonese

If not specified, the language will be automatically detected from the input text.

Error Handling

The SDK provides an APIError type with helper methods for handling specific errors:

import typecast "github.com/neosapience/typecast-sdk/typecast-go"

response, err := client.TextToSpeech(ctx, request)
if err != nil {
    if apiErr, ok := err.(*typecast.APIError); ok {
        fmt.Printf("Error %d: %s\n", apiErr.StatusCode, apiErr.Message)
        
        // Handle specific errors
        switch {
        case apiErr.IsUnauthorized():
            // 401: Invalid API key
        case apiErr.IsForbidden():
            // 403: Access denied
        case apiErr.IsPaymentRequired():
            // 402: Insufficient credits
        case apiErr.IsNotFound():
            // 404: Resource not found
        case apiErr.IsValidationError():
            // 422: Validation error
        case apiErr.IsRateLimited():
            // 429: Rate limit exceeded
        case apiErr.IsServerError():
            // 5xx: Server error
        case apiErr.IsBadRequest():
            // 400: Bad request
        }
    }
}

Error Types

Method	Status Code	Description
`IsBadRequest()`	400	Invalid request parameters
`IsUnauthorized()`	401	Invalid or missing API key
`IsPaymentRequired()`	402	Insufficient credits
`IsForbidden()`	403	Access denied
`IsNotFound()`	404	Resource not found
`IsValidationError()`	422	Validation error
`IsRateLimited()`	429	Rate limit exceeded
`IsServerError()`	5xx	Server error

Context and Timeouts

The SDK fully supports Go’s context.Context for cancellation and timeouts:

// With timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

response, err := client.TextToSpeech(ctx, request)
if err != nil {
    if ctx.Err() == context.DeadlineExceeded {
        fmt.Println("Request timed out")
    }
}

// With cancellation
ctx, cancel := context.WithCancel(context.Background())
go func() {
    time.Sleep(5 * time.Second)
    cancel()  // Cancel after 5 seconds
}()

response, err := client.TextToSpeech(ctx, request)

API Reference

Client Methods

Method	Description
`TextToSpeech(ctx, request)`	Convert text to speech audio
`GetVoicesV2(ctx, filter)`	Get available voices with filtering
`GetVoiceV2(ctx, voiceID)`	Get a specific voice by ID
`GetVoices(ctx, model)`	Get voices (V1 API, deprecated)
`GetVoice(ctx, voiceID, model)`	Get voice (V1 API, deprecated)

TTSRequest Fields

Field	Type	Required	Description
`VoiceID`	`string`	✓	Voice ID (format: `tc_` or `uc_`)
`Text`	`string`	✓	Text to synthesize (max 2000 chars)
`Model`	`TTSModel`	✓	TTS model (`ModelSSFMV21` or `ModelSSFMV30`)
`Language`	`string`		ISO 639-3 code (auto-detected if omitted)
`Prompt`	`Prompt` / `PresetPrompt` / `*SmartPrompt`		Emotion settings
`Output`	`*Output`		Audio output settings
`Seed`	`*uint32`		Unsigned integer seed for reproducibility (≥ 0)

TTSResponse Fields

Field	Type	Description
`AudioData`	`[]byte`	Generated audio data
`Duration`	`float64`	Audio duration in seconds
`Format`	`AudioFormat`	Audio format (`wav` or `mp3`)

Constants

Models

Constant	Value	Description
`ModelSSFMV30`	`ssfm-v30`	Latest model with improved prosody
`ModelSSFMV21`	`ssfm-v21`	Stable production model

Emotion Presets

Constant	ssfm-v21	ssfm-v30
`EmotionNormal`	✓	✓
`EmotionHappy`	✓	✓
`EmotionSad`	✓	✓
`EmotionAngry`	✓	✓
`EmotionWhisper`	✗	✓
`EmotionToneUp`	✗	✓
`EmotionToneDown`	✗	✓

Audio Formats

Constant	Value	Description
`AudioFormatWAV`	`wav`	Uncompressed PCM audio
`AudioFormatMP3`	`mp3`	Compressed MP3 audio

GET STARTED

SDKs

INTEGRATIONS

Go

Go Reference

Source Code

Installation

Quick Start

Features

Configuration

Environment Variables

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Voice Discovery (V2 API)

Multilingual Content

Streaming

Timestamp TTS

Basic Usage

Granularity

Subtitle Export

Supported Languages

Error Handling

Error Types

Context and Timeouts

API Reference

Client Methods

TTSRequest Fields

TTSResponse Fields

Constants

Models

Emotion Presets

Audio Formats

GET STARTED

SDKs

INTEGRATIONS

Documentation Index

Go Reference

Source Code

​Installation

​Quick Start

​Features

​Configuration

​Environment Variables

​Advanced Usage

​Emotion Control (ssfm-v30)

​Audio Customization

​Voice Discovery (V2 API)

​Multilingual Content

​Streaming

​Timestamp TTS

​Basic Usage

​Granularity

​Subtitle Export

​Supported Languages

​Error Handling

​Error Types

​Context and Timeouts

​API Reference

​Client Methods

​TTSRequest Fields

​TTSResponse Fields

​Constants

​Models

​Emotion Presets

​Audio Formats

Installation

Quick Start

Features

Configuration

Environment Variables

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Voice Discovery (V2 API)

Multilingual Content

Streaming

Timestamp TTS

Basic Usage

Granularity

Subtitle Export

Supported Languages

Error Handling

Error Types

Context and Timeouts

API Reference

Client Methods

TTSRequest Fields

TTSResponse Fields

Constants

Models

Emotion Presets

Audio Formats