Skip to main content
The official Zig library for the Typecast API. Convert text to lifelike speech using AI-powered voices. Pure Zig implementation — no C dependencies. Uses only std.http.Client and std.json from the Zig standard library.

Source Code

Typecast Zig SDK Source Code

Package

Zig Package (via zig fetch)

Installation

Add the dependency using zig fetch:
zig fetch --save "https://github.com/neosapience/typecast-sdk/archive/refs/tags/typecast-zig/v0.1.0.tar.gz"
Then add the import in your build.zig:
const typecast_dep = b.dependency("typecast_zig", .{
    .target = target,
    .optimize = optimize,
});
exe.root_module.addImport("typecast", typecast_dep.module("typecast"));

Quick Start

const std = @import("std");
const typecast = @import("typecast");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Initialize client (reads TYPECAST_API_KEY from environment)
    var client = typecast.Client.init(allocator, .{
        .api_key = std.posix.getenv("TYPECAST_API_KEY") orelse return error.MissingApiKey,
    });
    defer client.deinit();

    // Convert text to speech
    const response = try client.textToSpeech(.{
        .voice_id = "tc_672c5f5ce59fac2a48faeaee",
        .text = "Hello there! I'm your friendly text-to-speech agent.",
        .model = .ssfm_v30,
    });
    defer allocator.free(response.audio_data);

    // Save audio file
    const file = try std.fs.cwd().createFile("output.wav", .{});
    defer file.close();
    try file.writeAll(response.audio_data);

    std.debug.print("Saved {d} bytes, duration: {d:.1}s\n", .{
        response.audio_data.len, response.duration,
    });
}

Features

The Typecast Zig SDK provides powerful features for text-to-speech conversion:
  • Multiple Voice Models: Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
  • Multi-language Support: 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
  • Emotion Control: Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
  • Audio Customization: Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
  • Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
  • Pure Zig: Zero external dependencies, uses only the standard library
  • Streaming: Real-time chunked audio delivery for low-latency playback via callback
  • Explicit Memory Management: Caller-supplied allocator with clear ownership semantics

Configuration

Set your API key via environment variable or pass directly:
const typecast = @import("typecast");

// Using environment variable (recommended)
// export TYPECAST_API_KEY="your-api-key-here"
var client = typecast.Client.init(allocator, .{
    .api_key = std.posix.getenv("TYPECAST_API_KEY") orelse return error.MissingApiKey,
});
defer client.deinit();
// Or pass directly
var client = typecast.Client.init(allocator, .{
    .api_key = "your-api-key-here",
});
defer client.deinit();
// Custom base URL
var client = typecast.Client.init(allocator, .{
    .api_key = "your-api-key-here",
    .base_url = "https://custom-api.example.com",
});
defer client.deinit();

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.
Let the AI infer emotion from context:
const response = try client.textToSpeech(.{
    .voice_id = "tc_672c5f5ce59fac2a48faeaee",
    .text = "Everything is going to be okay.",
    .model = .ssfm_v30,
    .prompt = .{ .smart = .{
        .previous_text = "I just got the best news!",
        .next_text = "I can't wait to celebrate!",
    } },
});
defer allocator.free(response.audio_data);

Audio Customization

Control loudness, pitch, tempo, and output format:
const response = try client.textToSpeech(.{
    .voice_id = "tc_672c5f5ce59fac2a48faeaee",
    .text = "Customized audio output!",
    .model = .ssfm_v30,
    .output = .{
        .target_lufs = -14.0,
        .audio_pitch = 2,
        .audio_tempo = 1.2,
        .audio_format = .mp3,
    },
    .seed = 42,
});
defer allocator.free(response.audio_data);

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:
// Get all voices
const voices = try client.getVoicesV2(null);
defer allocator.free(voices);

// Filter by model
const filtered = try client.getVoicesV2(.{ .model = .ssfm_v30 });
defer allocator.free(filtered);

for (voices) |voice| {
    std.debug.print("ID: {s}, Name: {s}\n", .{ voice.voice_id, voice.voice_name });
}

// Get a specific voice
const voice = try client.getVoiceV2("tc_672c5f5ce59fac2a48faeaee", null);
std.debug.print("Voice: {s}\n", .{voice.voice_name});

Streaming

Stream audio chunks in real-time for low-latency playback via callback:
try client.textToSpeechStream(.{
    .voice_id = "tc_672c5f5ce59fac2a48faeaee",
    .text = "Stream this text as audio in real time.",
    .model = .ssfm_v30,
}, struct {
    var first = true;
    fn onChunk(chunk: []const u8) anyerror!void {
        var data = chunk;
        if (first) {
            data = chunk[44..]; // Skip 44-byte WAV header
            first = false;
        }
        // data is raw 16-bit mono PCM at 32000 Hz
        // Feed to your audio output
    }
}.onChunk);
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. The streaming endpoint does not support volume or target_lufs.

Supported Languages

The SDK supports 37 languages with automatic language detection:
CodeLanguageCodeLanguageCodeLanguage
engEnglishjpnJapaneseukrUkrainian
korKoreanellGreekindIndonesian
spaSpanishtamTamildanDanish
deuGermantglTagalogsweSwedish
fraFrenchfinFinnishmsaMalay
itaItalianzhoChinesecesCzech
polPolishslkSlovakporPortuguese
nldDutcharaArabicbulBulgarian
rusRussianhrvCroatianronRomanian
benBengalihinHindihunHungarian
nanHokkiennorNorwegianpanPunjabi
thaThaiturTurkishvieVietnamese
yueCantonese
If not specified, the language will be automatically detected from the input text.

Error Handling

The SDK uses Zig’s error union for handling API errors:
const response = client.textToSpeech(.{
    .voice_id = "tc_672c5f5ce59fac2a48faeaee",
    .text = "Hello",
    .model = .ssfm_v30,
}) catch |err| switch (err) {
    error.Unauthorized => {
        std.debug.print("Invalid API key\n", .{});
        return err;
    },
    error.PaymentRequired => {
        std.debug.print("Insufficient credits\n", .{});
        return err;
    },
    error.RateLimited => {
        std.debug.print("Rate limit exceeded - please retry later\n", .{});
        return err;
    },
    error.NotFound => {
        std.debug.print("Voice not found\n", .{});
        return err;
    },
    else => return err,
};
defer allocator.free(response.audio_data);

Error Types

ErrorStatus CodeDescription
error.BadRequest400Invalid request parameters
error.Unauthorized401Invalid or missing API key
error.PaymentRequired402Insufficient credits
error.NotFound404Resource not found
error.UnprocessableEntity422Validation error
error.RateLimited429Rate limit exceeded
error.InternalServerError500Server error
error.JsonParseError-JSON parsing error

API Reference

Client Methods

MethodDescription
init(allocator, config)Create client with configuration
deinit()Clean up client resources
textToSpeech(request)Convert text to speech audio
textToSpeechStream(request, callback)Stream audio chunks via callback
getMySubscription()Get subscription info
getVoices(model)Get available voices (V1)
getVoicesV2(filter)Get voices with metadata (V2)
getVoiceV2(voice_id, model)Get a specific voice