PHP - Typecast Documentation

The official PHP library for the Typecast API. Convert text to lifelike speech using AI-powered voices. Built with Guzzle 7 for reliable HTTP communication. Requires PHP 8.1+ and Composer.

Packagist

Typecast PHP SDK

Source Code

Typecast PHP SDK Source Code

Installation

Install via Composer:

composer require neosapience/typecast-php

Latest registered version: typecast-php/v0.1.1 in the SDK Git tags. Requires PHP 8.1 or higher and Composer. Check your version with php -v.

Quick Start

<?php
use Neosapience\Typecast\TypecastClient;
use Neosapience\Typecast\Models\TTSRequest;

// Initialize client
$client = new TypecastClient(apiKey: 'YOUR_API_KEY');

// Convert text to speech
$response = $client->textToSpeech(new TTSRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: "Hello there! I'm your friendly text-to-speech agent.",
    model: 'ssfm-v30',
));

// Save audio file
file_put_contents('output.wav', $response->audioData);

echo "Duration: {$response->duration}s, Format: {$response->format}\n";

Features

Multiple Voice Models: Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
Multi-language Support: 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
Emotion Control: Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
Audio Customization: Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
Instant Voice Cloning: Upload a WAV/MP3 sample and create a custom voice ID
Timestamp TTS: Word- and character-level alignment data for subtitles, karaoke, and lip-sync
Streaming: Real-time chunked audio delivery for low-latency playback via callback
Guzzle 7: Industry-standard HTTP client with automatic retries and connection pooling
Type Safety: Typed properties and named arguments (PHP 8.1+)

Configuration

Set your API key via environment variable or pass directly:

export TYPECAST_API_KEY="your-api-key-here"

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.

Smart Mode
Preset Mode

Let the AI infer emotion from context:

use Neosapience\Typecast\Models\{TTSRequest, SmartPrompt};

$response = $client->textToSpeech(new TTSRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'Everything is going to be okay.',
    model: 'ssfm-v30',
    prompt: new SmartPrompt(
        previousText: 'I just got the best news!',
        nextText: "I can't wait to celebrate!",
    ),
));

Explicitly set emotion with preset values:

use Neosapience\Typecast\Models\{TTSRequest, PresetPrompt};

$response = $client->textToSpeech(new TTSRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'I am so excited to show you these features!',
    model: 'ssfm-v30',
    prompt: new PresetPrompt(
        emotionPreset: 'happy',
        emotionIntensity: 1.5,
    ),
));

Audio Customization

Control loudness, pitch, tempo, and output format:

use Neosapience\Typecast\Models\{TTSRequest, Output};

$response = $client->textToSpeech(new TTSRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'Customized audio output!',
    model: 'ssfm-v30',
    output: new Output(
        targetLufs: -14.0,
        audioPitch: 2,
        audioTempo: 1.2,
        audioFormat: 'mp3',
    ),
    seed: 42,
));

file_put_contents('output.mp3', $response->audioData);

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:

use Neosapience\Typecast\Models\VoicesV2Filter;

// Get all voices
$voices = $client->getVoicesV2();

// Filter by criteria
$filtered = $client->getVoicesV2(new VoicesV2Filter(
    model: 'ssfm-v30',
    gender: 'female',
    age: 'young_adult',
));

foreach ($voices as $voice) {
    echo "ID: {$voice->voiceId}, Name: {$voice->voiceName}\n";
    echo "Gender: {$voice->gender}, Age: {$voice->age}\n";
}

// Get a specific voice by ID
$voice = $client->getVoiceV2('tc_672c5f5ce59fac2a48faeaee');

Streaming

Stream audio chunks in real-time for low-latency playback via callback:

use Neosapience\Typecast\Models\TTSRequestStream;

$first = true;
$client->textToSpeechStream(
    new TTSRequestStream(
        voiceId: 'tc_672c5f5ce59fac2a48faeaee',
        text: 'Stream this text as audio in real time.',
        model: 'ssfm-v30',
    ),
    function (string $chunk) use (&$first): void {
        if ($first) {
            $chunk = substr($chunk, 44); // Skip 44-byte WAV header
            $first = false;
        }
        // $chunk is raw 16-bit mono PCM at 32000 Hz
        // Feed to your audio output or pipe to ffplay
    },
);

WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable.

Timestamp TTS

textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.

Basic Usage

use Neosapience\Typecast\TypecastClient;
use Neosapience\Typecast\Models\TTSRequestWithTimestamps;

$client = new TypecastClient(apiKey: 'YOUR_API_KEY');

$result = $client->textToSpeechWithTimestamps(new TTSRequestWithTimestamps(
    voiceId:  'tc_60e5426de8b95f1d3000d7b5',
    text:     'Hello. How are you?',
    model:    'ssfm-v30',
));

file_put_contents('output.wav', $result->audioData);
echo "Duration: {$result->audioDuration}s\n";

foreach ($result->words as $word) {
    echo "  [{$word->startTime}s – {$word->endTime}s] {$word->text}\n";
}

Granularity

Pass granularity: 'word' (default) or granularity: 'char' to control the alignment unit.

$result = $client->textToSpeechWithTimestamps(new TTSRequestWithTimestamps(
    voiceId:     'tc_60e5426de8b95f1d3000d7b5',
    text:        'Hello. How are you?',
    model:       'ssfm-v30',
    granularity: 'char',  // required for Japanese / Chinese
));

Subtitle Export

file_put_contents('output.srt', $result->toSrt());
file_put_contents('output.vtt', $result->toVtt());

Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use granularity: 'char' for these languages to get character-level alignment.

Instant Voice Cloning

Clone a custom voice from a short audio sample, then pass the returned uc_ voice ID directly to TTS.

<?php
use Neosapience\Typecast\TypecastClient;
use Neosapience\Typecast\Models\TTSRequest;

$client = new TypecastClient(apiKey: 'YOUR_API_KEY');

$voice = $client->cloneVoice(
    audio: file_get_contents('sample.wav'),
    filename: 'sample.wav',
    name: 'My Voice',
    model: 'ssfm-v30',
);

$response = $client->textToSpeech(new TTSRequest(
    voiceId: $voice->voiceId,
    text: 'Hello from my cloned voice!',
    model: 'ssfm-v30',
));

file_put_contents('output.wav', $response->audioData);
$client->deleteVoice($voice->voiceId);

Voice cloning audio must be 25 MB or smaller, the audio duration must be 5-150 seconds, and the custom voice name must be 1-30 characters.

Supported Languages

The SDK supports 37 languages with automatic language detection:

Code	Language	Code	Language	Code	Language
`eng`	English	`jpn`	Japanese	`ukr`	Ukrainian
`kor`	Korean	`ell`	Greek	`ind`	Indonesian
`spa`	Spanish	`tam`	Tamil	`dan`	Danish
`deu`	German	`tgl`	Tagalog	`swe`	Swedish
`fra`	French	`fin`	Finnish	`msa`	Malay
`ita`	Italian	`zho`	Chinese	`ces`	Czech
`pol`	Polish	`slk`	Slovak	`por`	Portuguese
`nld`	Dutch	`ara`	Arabic	`bul`	Bulgarian
`rus`	Russian	`hrv`	Croatian	`ron`	Romanian
`ben`	Bengali	`hin`	Hindi	`hun`	Hungarian
`nan`	Hokkien	`nor`	Norwegian	`pan`	Punjabi
`tha`	Thai	`tur`	Turkish	`vie`	Vietnamese
`yue`	Cantonese

If not specified, the language will be automatically detected from the input text.

Error Handling

The SDK throws specific exceptions for each HTTP error:

use Neosapience\Typecast\Exceptions\{
    TypecastException,
    UnauthorizedException,
    PaymentRequiredException,
    RateLimitException,
};

try {
    $response = $client->textToSpeech($request);
} catch (UnauthorizedException $e) {
    echo "Invalid API key: {$e->getMessage()}\n";
} catch (PaymentRequiredException $e) {
    echo "Insufficient credits\n";
} catch (RateLimitException $e) {
    echo "Rate limit exceeded - please retry later\n";
} catch (TypecastException $e) {
    echo "Error: {$e->getMessage()}\n";
}

Exception	Status Code	Description
`BadRequestException`	400	Invalid request parameters
`UnauthorizedException`	401	Invalid or missing API key
`PaymentRequiredException`	402	Insufficient credits
`NotFoundException`	404	Resource not found
`UnprocessableEntityException`	422	Validation error
`RateLimitException`	429	Rate limit exceeded
`InternalServerException`	500	Server error

API Reference

TypecastClient Methods

Method	Description
`textToSpeech(TTSRequest)`	Convert text to speech audio
`textToSpeechStream(TTSRequestStream, callable)`	Stream audio chunks via callback
`cloneVoice($audio, filename, name, model)`	Create a custom voice via instant cloning
`deleteVoice(string $voiceId)`	Delete a custom cloned voice
`getMySubscription()`	Get subscription info
`getVoices(?string $model)`	Get available voices (V1)
`getVoicesV2(?VoicesV2Filter)`	Get voices with metadata (V2)
`getVoiceV2(string $voiceId)`	Get a specific voice

Packagist

Source Code

​Installation

​Quick Start

​Features

​Configuration

​Advanced Usage

​Emotion Control (ssfm-v30)

​Audio Customization

​Voice Discovery (V2 API)

​Streaming

​Timestamp TTS

​Basic Usage

​Granularity

​Subtitle Export

​Instant Voice Cloning

​Supported Languages

​Error Handling

​API Reference

​TypecastClient Methods

Installation

Quick Start

Features

Configuration

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Voice Discovery (V2 API)

Streaming

Timestamp TTS

Basic Usage

Granularity

Subtitle Export

Instant Voice Cloning

Supported Languages

Error Handling

API Reference

TypecastClient Methods