Access the Typecast API with our official PHP SDK.
The official PHP library for the Typecast API. Convert text to lifelike speech using AI-powered voices.Built with Guzzle 7 for reliable HTTP communication. Requires PHP 8.1+ and Composer.
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
use Neosapience\Typecast\Models\{TTSRequest, SmartPrompt};$response = $client->textToSpeech(new TTSRequest( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'Everything is going to be okay.', model: 'ssfm-v30', prompt: new SmartPrompt( previousText: 'I just got the best news!', nextText: "I can't wait to celebrate!", ),));
Explicitly set emotion with preset values:
use Neosapience\Typecast\Models\{TTSRequest, PresetPrompt};$response = $client->textToSpeech(new TTSRequest( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'I am so excited to show you these features!', model: 'ssfm-v30', prompt: new PresetPrompt( emotionPreset: 'happy', emotionIntensity: 1.5, ),));
List and filter available voices with enhanced metadata:
use Neosapience\Typecast\Models\VoicesV2Filter;// Get all voices$voices = $client->getVoicesV2();// Filter by criteria$filtered = $client->getVoicesV2(new VoicesV2Filter( model: 'ssfm-v30', gender: 'female', age: 'young_adult',));foreach ($voices as $voice) { echo "ID: {$voice->voiceId}, Name: {$voice->voiceName}\n"; echo "Gender: {$voice->gender}, Age: {$voice->age}\n";}// Get a specific voice by ID$voice = $client->getVoiceV2('tc_672c5f5ce59fac2a48faeaee');
Stream audio chunks in real-time for low-latency playback via callback:
use Neosapience\Typecast\Models\TTSRequestStream;$first = true;$client->textToSpeechStream( new TTSRequestStream( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'Stream this text as audio in real time.', model: 'ssfm-v30', ), function (string $chunk) use (&$first): void { if ($first) { $chunk = substr($chunk, 44); // Skip 44-byte WAV header $first = false; } // $chunk is raw 16-bit mono PCM at 32000 Hz // Feed to your audio output or pipe to ffplay },);
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable.
textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.
Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use granularity: 'char' for these languages to get character-level alignment.