Skip to main content
The official Dart and Flutter SDK for the Typecast API. Convert text to lifelike speech, stream audio, generate timestamps, discover voices, and create custom voices from Dart or Flutter applications.

pub.dev

Typecast Dart SDK

Source Code

Typecast Dart SDK Source Code

Installation

Install from pub.dev:
dart pub add typecast_dart
Latest registered version: 0.1.4 on pub.dev.
For Flutter projects:
flutter pub add typecast_dart
flutter pub add audioplayers
Use typecast_dart 0.1.4 or higher. For production Flutter apps, avoid embedding a long-lived API key in a distributed client. Route requests through your backend when the API key must remain private.

Quick Start

import 'package:audioplayers/audioplayers.dart';
import 'package:typecast_dart/typecast_dart.dart';

final client = TypecastClient(apiKey: 'YOUR_API_KEY');
final player = AudioPlayer();

Future<void> speakAndPlay() async {
  final response = await client.textToSpeech(
    const TtsRequest(
      voiceId: 'tc_672c5f5ce59fac2a48faeaee', // Find voice IDs at https://typecast.ai/developers/api/voices
      text: "Hello there! I'm your friendly text-to-speech agent.",
      model: TtsModel.ssfmV30,
      language: LanguageCode.eng,
      output: Output(audioFormat: AudioFormat.wav),
    ),
  );

  await player.play(BytesSource(response.audioData));
  print('Duration: ${response.duration}s, Format: ${response.format.value}');
}

Playback in Flutter

The Dart SDK returns generated audio as bytes. In Flutter, pass those bytes to an audio playback package such as audioplayers. Use one shared AudioPlayer instance and play each response directly from memory:
import 'package:audioplayers/audioplayers.dart';
import 'package:typecast_dart/typecast_dart.dart';

final client = TypecastClient(apiKey: 'YOUR_API_KEY');
final player = AudioPlayer();

Future<void> playTts(String text) async {
  final response = await client.textToSpeech(
    TtsRequest(
      voiceId: 'tc_672c5f5ce59fac2a48faeaee',
      text: text,
      model: TtsModel.ssfmV30,
      language: LanguageCode.eng,
      output: const Output(audioFormat: AudioFormat.wav),
    ),
  );

  await player.play(BytesSource(response.audioData));
}
For production Flutter apps, keep long-lived API keys on your backend. The Flutter app can request generated audio from your backend and still play the returned bytes with BytesSource.

Features

  • Multiple Voice Models: Support for ssfm-v30 and ssfm-v21 AI voice models
  • Multi-language Support: 37 languages including English, Korean, Japanese, Chinese, Spanish, and more
  • Emotion Control: Preset emotions or smart context-aware inference
  • Audio Customization: Control loudness, pitch, tempo, and output format
  • Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
  • Streaming: Access the streaming TTS endpoint as a Dart Stream<List<int>>
  • Timestamp TTS: Word- and character-level alignment data with SRT/VTT helpers
  • Instant Voice Cloning: Upload a WAV sample and create a custom voice ID
  • Dart and Flutter: Use the same package in Dart CLI, server, and Flutter projects

Configuration

Set your API key via environment variable or constructor:
export TYPECAST_API_KEY="your-api-key-here"
When requests go through your own proxy, set baseUrl to the proxy endpoint and omit apiKey. The SDK will not send the X-API-KEY header for empty or missing keys. Requests to the default Typecast host still require an API key.
Proxy without API key
final client = TypecastClient(
  baseUrl: 'https://your-proxy.example.com',
);

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.
Let the AI infer emotion from context:
final response = await client.textToSpeech(
  const TtsRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'Everything is going to be okay.',
    model: TtsModel.ssfmV30,
    prompt: SmartPrompt(
      previousText: 'I just got the best news!',
      nextText: "I can't wait to celebrate!",
    ),
  ),
);

await player.play(BytesSource(response.audioData));

Audio Customization

Control loudness, pitch, tempo, and output format:
final response = await client.textToSpeech(
  const TtsRequest(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'Customized audio output!',
    model: TtsModel.ssfmV30,
    output: Output(
      targetLufs: -14.0,
      audioPitch: 2,
      audioTempo: 1.2,
      audioFormat: AudioFormat.mp3,
    ),
    seed: 42,
  ),
);

await player.play(BytesSource(response.audioData));

Generate audio to a file

Use generateToFile when you want the SDK to synthesize speech and write the audio bytes directly to a local file. The model defaults to ssfm-v30, and .mp3 / .wav extensions infer the output format when no output format is set. Browse available voice IDs on the Voices page.
await client.generateToFile(
  'output.mp3',
  GenerateToFileRequest(
    text: 'Hello from Typecast.',
    voiceId: 'tc_672c5f5ce59fac2a48faeaee', // Find voice IDs at https://typecast.ai/developers/api/voices
  ),
);

Text pauses

Use text pause markup when you only need silent gaps inside one composed text segment. Put <|5s|>, <|1s|>, <|0.3s|>, or <|0.34413s|> directly in the text. The value is interpreted as seconds and must end with s. This keeps the pause expression visible in plain text without adding separate pause calls.
final audio = await client
    .composeSpeech()
    .defaults(ComposerSettings(voiceId: 'tc_672c5f5ce59fac2a48faeaee', model: TTSModel.ssfmV30))
    .say('Hello<|5s|>Nice to meet you<|1s|>Today<|2s|>how does the weather feel?')
    .generate();

Multi-speaker composition

Use the composer chaining API when one output file needs different voices or per-segment options such as pitch, tempo, prompt, or seed. The composer generates each segment as WAV, trims leading/trailing silent PCM samples, and concatenates the result. If you need MP3, generate WAV first and convert it in your app or server pipeline.
final audio = await client
    .composeSpeech()
    .defaults(const ComposerSettings(voiceId: 'tc_672c5f5ce59fac2a48faeaee', model: TtsModel.ssfmV30))
    .say('Hello there')
    .pause(5)
    .say(
      'Nice to meet you',
      overrides: const ComposerSettings(
        voiceId: 'tc_60e5426de8b95f1d3000d7b5',
        output: Output(audioPitch: 2),
      ),
    )
    .say('Today')
  .pause(2)
  .say('How does the weather feel?')
    .generate();

await File('conversation.wav').writeAsBytes(audio.audioData);

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:
final voices = await client.getVoicesV2();

final filtered = await client.getVoicesV2(
  const VoicesV2Filter(
    model: TtsModel.ssfmV30,
    gender: 'female',
    age: 'young_adult',
  ),
);

for (final voice in voices) {
  print('ID: ${voice.voiceId}, Name: ${voice.voiceName}');
  print('Gender: ${voice.gender}, Age: ${voice.age}');
  print('Models: ${voice.models.map((model) => model.version).join(', ')}');
}

final voice = await client.getVoiceV2('tc_672c5f5ce59fac2a48faeaee');
print(voice.voiceName);

Streaming

Consume streaming audio as a Dart stream and play it without writing a file:
import 'dart:typed_data';

final stream = await client.textToSpeechStream(
  const TtsRequestStream(
    voiceId: 'tc_672c5f5ce59fac2a48faeaee',
    text: 'Stream this text as audio in real time.',
    model: TtsModel.ssfmV30,
    output: OutputStream(audioFormat: AudioFormat.wav),
  ),
);

final audioBytes = <int>[];
await for (final chunk in stream) {
  audioBytes.addAll(chunk);
}

await player.play(BytesSource(Uint8List.fromList(audioBytes)));
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. The example above avoids file storage and plays the complete stream from memory. For true low-latency chunk-by-chunk playback, feed the PCM chunks into a streaming audio engine instead of audioplayers.

Timestamp TTS

textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns audio together with word- or character-level alignment data.
final result = await client.textToSpeechWithTimestamps(
  const TtsRequest(
    voiceId: 'tc_60e5426de8b95f1d3000d7b5',
    text: 'Hello. How are you?',
    model: TtsModel.ssfmV30,
  ),
);

await player.play(BytesSource(result.audioBytes()));
print('Duration: ${result.audioDuration}s');

for (final word in result.words) {
  print('[${word.startTime}s - ${word.endTime}s] ${word.word}');
}

Granularity

Pass granularity: 'word' (default) or granularity: 'char' to control the alignment unit.
final result = await client.textToSpeechWithTimestamps(
  const TtsRequest(
    voiceId: 'tc_60e5426de8b95f1d3000d7b5',
    text: 'Hello. How are you?',
    model: TtsModel.ssfmV30,
  ),
  granularity: 'char',
);

Subtitle Export

await File('output.srt').writeAsString(result.toSrt());
await File('output.vtt').writeAsString(result.toVtt());

Instant Voice Cloning

Upload a short WAV sample to create a custom voice:
final voice = await client.cloneVoice(
  audio: await File('sample.wav').readAsBytes(),
  filename: 'sample.wav',
  name: 'My Voice',
  model: TtsModel.ssfmV30,
);

print('Custom voice ID: ${voice.voiceId}');
Voice cloning audio must be 25 MB or smaller, and the custom voice name must be 1-30 characters.