Access the Typecast API with our official Dart and Flutter SDK.
The official Dart and Flutter SDK for the Typecast API. Convert text to lifelike speech, stream audio, generate timestamps, discover voices, and create custom voices from Dart or Flutter applications.
Use typecast_dart 0.1.4 or higher. For production Flutter apps, avoid embedding a long-lived API key in a distributed client. Route requests through your backend when the API key must remain private.
The Dart SDK returns generated audio as bytes. In Flutter, pass those bytes to an audio playback package such as audioplayers.Use one shared AudioPlayer instance and play each response directly from memory:
For production Flutter apps, keep long-lived API keys on your backend. The Flutter app can request generated audio from your backend and still play the returned bytes with BytesSource.
Set your API key via environment variable or constructor:
export TYPECAST_API_KEY="your-api-key-here"
When requests go through your own proxy, set baseUrl to the proxy endpoint and omit apiKey. The SDK will not send the X-API-KEY header for empty or missing keys. Requests to the default Typecast host still require an API key.
Proxy without API key
final client = TypecastClient( baseUrl: 'https://your-proxy.example.com',);
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
final response = await client.textToSpeech( const TtsRequest( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'Everything is going to be okay.', model: TtsModel.ssfmV30, prompt: SmartPrompt( previousText: 'I just got the best news!', nextText: "I can't wait to celebrate!", ), ),);await player.play(BytesSource(response.audioData));
Explicitly set emotion with preset values:
final response = await client.textToSpeech( const TtsRequest( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'I am so excited to show you these features!', model: TtsModel.ssfmV30, prompt: PresetPrompt( emotionPreset: EmotionPreset.happy, emotionIntensity: 1.5, ), ),);await player.play(BytesSource(response.audioData));
Use generateToFile when you want the SDK to synthesize speech and write the audio bytes directly to a local file. The model defaults to ssfm-v30, and .mp3 / .wav extensions infer the output format when no output format is set. Browse available voice IDs on the Voices page.
await client.generateToFile( 'output.mp3', GenerateToFileRequest( text: 'Hello from Typecast.', voiceId: 'tc_672c5f5ce59fac2a48faeaee', // Find voice IDs at https://typecast.ai/developers/api/voices ),);
Use text pause markup when you only need silent gaps inside one composed text segment. Put <|5s|>, <|1s|>, <|0.3s|>, or <|0.34413s|> directly in the text. The value is interpreted as seconds and must end with s. This keeps the pause expression visible in plain text without adding separate pause calls.
final audio = await client .composeSpeech() .defaults(ComposerSettings(voiceId: 'tc_672c5f5ce59fac2a48faeaee', model: TTSModel.ssfmV30)) .say('Hello<|5s|>Nice to meet you<|1s|>Today<|2s|>how does the weather feel?') .generate();
Use the composer chaining API when one output file needs different voices or per-segment options such as pitch, tempo, prompt, or seed. The composer generates each segment as WAV, trims leading/trailing silent PCM samples, and concatenates the result. If you need MP3, generate WAV first and convert it in your app or server pipeline.
final audio = await client .composeSpeech() .defaults(const ComposerSettings(voiceId: 'tc_672c5f5ce59fac2a48faeaee', model: TtsModel.ssfmV30)) .say('Hello there') .pause(5) .say( 'Nice to meet you', overrides: const ComposerSettings( voiceId: 'tc_60e5426de8b95f1d3000d7b5', output: Output(audioPitch: 2), ), ) .say('Today') .pause(2) .say('How does the weather feel?') .generate();await File('conversation.wav').writeAsBytes(audio.audioData);
Consume streaming audio as a Dart stream and play it without writing a file:
import 'dart:typed_data';final stream = await client.textToSpeechStream( const TtsRequestStream( voiceId: 'tc_672c5f5ce59fac2a48faeaee', text: 'Stream this text as audio in real time.', model: TtsModel.ssfmV30, output: OutputStream(audioFormat: AudioFormat.wav), ),);final audioBytes = <int>[];await for (final chunk in stream) { audioBytes.addAll(chunk);}await player.play(BytesSource(Uint8List.fromList(audioBytes)));
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. The example above avoids file storage and plays the complete stream from memory. For true low-latency chunk-by-chunk playback, feed the PCM chunks into a streaming audio engine instead of audioplayers.
textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns audio together with word- or character-level alignment data.
final result = await client.textToSpeechWithTimestamps( const TtsRequest( voiceId: 'tc_60e5426de8b95f1d3000d7b5', text: 'Hello. How are you?', model: TtsModel.ssfmV30, ),);await player.play(BytesSource(result.audioBytes()));print('Duration: ${result.audioDuration}s');for (final word in result.words) { print('[${word.startTime}s - ${word.endTime}s] ${word.word}');}
Pass granularity: 'word' (default) or granularity: 'char' to control the alignment unit.
final result = await client.textToSpeechWithTimestamps( const TtsRequest( voiceId: 'tc_60e5426de8b95f1d3000d7b5', text: 'Hello. How are you?', model: TtsModel.ssfmV30, ), granularity: 'char',);