Kotlin - Typecast Documentation

The official Kotlin library for the Typecast API. Convert text to lifelike speech using AI-powered voices. Compatible with Kotlin 1.9+ and JDK 17 or later. Works with Gradle (Kotlin DSL or Groovy) and Maven.

Maven Central

Typecast Kotlin SDK

Source Code

Typecast Kotlin SDK Source Code

Installation

Gradle (Kotlin DSL)
Gradle (Groovy)
Maven

Add the following dependency to your build.gradle.kts:

dependencies {
    implementation("com.neosapience:typecast-kotlin:1.2.7")
}

Add to your build.gradle:

implementation 'com.neosapience:typecast-kotlin:1.2.7'

Add the following dependency to your pom.xml:

<dependency>
    <groupId>com.neosapience</groupId>
    <artifactId>typecast-kotlin</artifactId>
    <version>1.2.7</version>
</dependency>

Latest registered version: 1.2.7 on Maven Central. Make sure you have version 1.2.7 or higher installed. If you have an older version, update your dependency version.

Quick Start

import com.neosapience.TypecastClient
import com.neosapience.models.*
import java.io.File

fun main() {
    // Initialize client
    val client = TypecastClient.create("YOUR_API_KEY")

    // Convert text to speech
    val request = TTSRequest.builder()
        .voiceId("tc_672c5f5ce59fac2a48faeaee")
        .text("Hello there! I'm your friendly text-to-speech agent.")
        .model(TTSModel.SSFM_V30)
        .build()

    val response = client.textToSpeech(request)

    // Save audio file
    File("output.${response.format}").writeBytes(response.audioData)

    println("Audio saved! Duration: ${response.duration}s, Format: ${response.format}")

    // Clean up
    client.close()
}

Features

The Typecast Kotlin SDK provides powerful features for text-to-speech conversion:

Multiple Voice Models: Support for ssfm-v30 (latest) and ssfm-v21 AI voice models
Multi-language Support: 37 languages including English, Korean, Spanish, Japanese, Chinese, and more
Emotion Control: Preset emotions (normal, happy, sad, angry, whisper, toneup, tonedown) or smart context-aware inference
Audio Customization: Control loudness (LUFS -70 to 0), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (WAV/MP3)
Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
Instant Voice Cloning: Upload a WAV/MP3 sample and create a custom voice ID
Timestamp TTS: Word- and character-level alignment data for subtitles, karaoke, and lip-sync
Idiomatic Kotlin: Builder pattern with Kotlin-friendly syntax using data classes
Comprehensive Error Handling: Specific exception classes for each error type
Streaming: Real-time chunked audio delivery for low-latency playback

Voice Recommendations

Use recommendVoices when you know the desired style but not the exact voice_id.

val voices = client.recommendVoices(
    "warm female voice for a product tutorial",
    count = 3,
)

voices.forEach { voice ->
    println("${voice.voiceId} ${voice.voiceName} ${voice.score}")
}

Recommendation results contain only voiceId, voiceName, and score. Use getVoiceV2 or getVoicesV2 when you need detailed metadata such as supported models, emotions, gender, age, or use cases.

Configuration

Set your API key via environment variable, .env file, or builder:

// Using environment variable
// export TYPECAST_API_KEY="your-api-key-here"
val client = TypecastClient.create()

// Or pass directly
val client = TypecastClient.create("your-api-key-here")

// Or use builder for custom configuration
val client = TypecastClient.builder()
    .apiKey("your-api-key-here")
    .baseUrl("https://custom-api.example.com")
    .build()

When requests go through your own proxy, set baseUrl to the proxy endpoint and omit apiKey. The SDK will not send the X-API-KEY header for empty or missing keys. Requests to the default Typecast host still require an API key.

Proxy without API key

val client = TypecastClient.builder()
    .baseUrl("https://your-proxy.example.com")
    .build()

Environment File

Create a .env file in your project root:

TYPECAST_API_KEY=your-api-key-here

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.

Smart Mode
Preset Mode

Let the AI infer emotion from context:

val request = TTSRequest.builder()
    .voiceId("tc_672c5f5ce59fac2a48faeaee")
    .text("Everything is going to be okay.")
    .model(TTSModel.SSFM_V30)
    .prompt(SmartPrompt.builder()
        .previousText("I just got the best news!")  // Optional context
        .nextText("I can't wait to celebrate!")     // Optional context
        .build())
    .build()

val response = client.textToSpeech(request)

Explicitly set emotion with preset values:

val request = TTSRequest.builder()
    .voiceId("tc_672c5f5ce59fac2a48faeaee")
    .text("I am so excited to show you these features!")
    .model(TTSModel.SSFM_V30)
    .prompt(PresetPrompt.builder()
        .emotionPreset(EmotionPreset.HAPPY)  // normal, happy, sad, angry, whisper, toneup, tonedown
        .emotionIntensity(1.5)               // Range: 0.0 to 2.0
        .build())
    .build()

val response = client.textToSpeech(request)

Audio Customization

Control loudness, pitch, tempo, and output format:

val request = TTSRequest.builder()
    .voiceId("tc_672c5f5ce59fac2a48faeaee")
    .text("Customized audio output!")
    .model(TTSModel.SSFM_V30)
    .output(Output.builder()
        .targetLufs(-14.0)                 // Range: -70 to 0 (LUFS)
        .audioPitch(2)                  // Range: -12 to +12 semitones
        .audioTempo(1.2)                // Range: 0.5x to 2.0x
        .audioFormat(AudioFormat.MP3)   // Options: WAV, MP3
        .build())
    .seed(42)  // Unsigned seed for reproducible results
    .build()

val response = client.textToSpeech(request)

File("output.${response.format}").writeBytes(response.audioData)
println("Duration: ${response.duration}s, Format: ${response.format}")

Generate audio to a file

Use generateToFile when you want the SDK to synthesize speech and write the audio bytes directly to a local file. The model defaults to ssfm-v30, and .mp3 / .wav extensions infer the output format when output.audioFormat is not set. Browse available voice IDs on the Voices page.

client.generateToFile(
    "output.mp3",
    GenerateToFileRequest(
        text = "Hello from Typecast.",
        voiceId = "tc_672c5f5ce59fac2a48faeaee", // Find voice IDs at https://typecast.ai/developers/api/voices
    )
)

Text pauses

Use text pause markup when you only need silent gaps inside one composed text segment. Put <|5s|>, <|1s|>, <|0.3s|>, or <|0.34413s|> directly in the text. The value is interpreted as seconds and must end with s. This keeps the pause expression visible in plain text without adding separate pause calls.

val audio = client.composeSpeech()
    .defaults(ComposerSettings(voiceId = "tc_672c5f5ce59fac2a48faeaee", model = TTSModel.SSFM_V30))
    .say("Hello<|5s|>Nice to meet you<|1s|>Today<|2s|>how does the weather feel?")
    .generate()

Multi-speaker composition

Use the composer chaining API when one output file needs different voices or per-segment options such as pitch, tempo, prompt, or seed. The composer generates each segment as WAV, trims leading/trailing silent PCM samples, and concatenates the result. If you need MP3, generate WAV first and convert it in your app or server pipeline.

val client = TypecastClient.create("YOUR_API_KEY")

val audio = client.composeSpeech()
    .defaults(ComposerSettings(voiceId = "tc_672c5f5ce59fac2a48faeaee", model = TTSModel.SSFM_V30))
    .say("Hello there")
    .pause(5.0)
    .say("Nice to meet you", ComposerSettings(voiceId = "tc_60e5426de8b95f1d3000d7b5", output = Output(audioPitch = 2)))
    .say("Today")
    .pause(2.0)
    .say("How does the weather feel?")
    .generate()

File("conversation.wav").writeBytes(audio.audioData)

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:

// Get all voices
val voices = client.getVoicesV2()

// Filter by criteria
val filter = VoicesV2Filter.builder()
    .model(TTSModel.SSFM_V30)
    .gender(GenderEnum.FEMALE)
    .age(AgeEnum.YOUNG_ADULT)
    .build()

val filtered = client.getVoicesV2(filter)

// Display voice info
voices.forEach { voice ->
    println("ID: ${voice.voiceId}, Name: ${voice.voiceName}")
    println("Gender: ${voice.gender}, Age: ${voice.age}")
    
    voice.models.forEach { model ->
        println("Model: ${model.version}, Emotions: ${model.emotions}")
    }
    
    voice.useCases?.let { useCases ->
        println("Use cases: ${useCases.joinToString(", ")}")
    }
}

Multilingual Content

The SDK supports 37 languages with automatic language detection:

// Auto-detect language (recommended)
val request = TTSRequest.builder()
    .voiceId("tc_672c5f5ce59fac2a48faeaee")
    .text("こんにちは。お元気ですか。")
    .model(TTSModel.SSFM_V30)
    .build()

val response = client.textToSpeech(request)

// Or specify language explicitly
val koreanRequest = TTSRequest.builder()
    .voiceId("tc_672c5f5ce59fac2a48faeaee")
    .text("안녕하세요. 반갑습니다.")
    .model(TTSModel.SSFM_V30)
    .language(LanguageCode.KOR)  // ISO 639-3 language code
    .build()

val koreanResponse = client.textToSpeech(koreanRequest)

File("output.${response.format}").writeBytes(response.audioData)

Streaming

Stream audio chunks in real-time for low-latency playback:

import javax.sound.sampled.*

// Set up audio playback: 32000 Hz, 16-bit, mono, little-endian
val format = AudioFormat(32000f, 16, 1, true, false)
val line = AudioSystem.getSourceDataLine(format).apply {
    open(format, 8192)
    start()
}

val stream = client.textToSpeechStream(request)
val buf = ByteArray(4096)
var first = true

while (true) {
    val bytesRead = stream.read(buf)
    if (bytesRead == -1) break
    var offset = 0
    var len = bytesRead
    if (first) {
        offset = 44           // Skip 44-byte WAV header
        len -= 44
        first = false
    }
    line.write(buf, offset, len)
}
line.drain()
line.close()
stream.close()
client.close()

WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable.

Timestamp TTS

textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.

Basic Usage

import com.neosapience.TypecastClient
import com.neosapience.models.*
import java.io.File

val client = TypecastClient.create("YOUR_API_KEY")

val request = TTSRequestWithTimestamps.builder()
    .voiceId("tc_60e5426de8b95f1d3000d7b5")
    .text("Hello. How are you?")
    .model(TTSModel.SSFM_V30)
    .build()

val result = client.textToSpeechWithTimestamps(request)

File("output.wav").writeBytes(result.audioBytes)
println("Duration: ${result.audioDuration}s")

result.words.forEach { w ->
    println("  [${w.startTime}s – ${w.endTime}s] ${w.text}")
}

client.close()

Granularity

Pass .granularity(Granularity.WORD) (default) or .granularity(Granularity.CHAR) to control the alignment unit.

val request = TTSRequestWithTimestamps.builder()
    .voiceId("tc_60e5426de8b95f1d3000d7b5")
    .text("Hello. How are you?")
    .model(TTSModel.SSFM_V30)
    .granularity(Granularity.CHAR)  // required for Japanese / Chinese
    .build()

Subtitle Export

File("output.srt").writeText(result.toSrt())
File("output.vtt").writeText(result.toVtt())

Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use Granularity.CHAR for these languages to get character-level alignment.

Instant Voice Cloning

Clone a custom voice from a short audio sample, then pass the returned uc_ voice ID directly to TTS.

import com.neosapience.TypecastClient
import com.neosapience.models.*
import java.io.File

val client = TypecastClient.create("YOUR_API_KEY")

val voice = client.cloneVoice(
    File("sample.wav"),
    "My Voice",
    "ssfm-v30",
)

val request = TTSRequest.builder()
    .voiceId(voice.voiceId)
    .text("Hello from my cloned voice!")
    .model(TTSModel.SSFM_V30)
    .build()

val response = client.textToSpeech(request)
client.deleteVoice(voice.voiceId)
client.close()

Voice cloning audio must be 25 MB or smaller, the audio duration must be 5-150 seconds, and the custom voice name must be 1-30 characters.

Supported Languages

The SDK supports 37 languages with automatic language detection:

Code	Language	Code	Language	Code	Language
`ENG`	English	`JPN`	Japanese	`UKR`	Ukrainian
`KOR`	Korean	`ELL`	Greek	`IND`	Indonesian
`SPA`	Spanish	`TAM`	Tamil	`DAN`	Danish
`DEU`	German	`TGL`	Tagalog	`SWE`	Swedish
`FRA`	French	`FIN`	Finnish	`MSA`	Malay
`ITA`	Italian	`ZHO`	Chinese	`CES`	Czech
`POL`	Polish	`SLK`	Slovak	`POR`	Portuguese
`NLD`	Dutch	`ARA`	Arabic	`BUL`	Bulgarian
`RUS`	Russian	`HRV`	Croatian	`RON`	Romanian
`BEN`	Bengali	`HIN`	Hindi	`HUN`	Hungarian
`NAN`	Hokkien	`NOR`	Norwegian	`PAN`	Punjabi
`THA`	Thai	`TUR`	Turkish	`VIE`	Vietnamese
`YUE`	Cantonese

If not specified, the language will be automatically detected from the input text.

Error Handling

The SDK provides specific exception classes for handling API errors:

import com.neosapience.TypecastClient
import com.neosapience.exceptions.*

try {
    val response = client.textToSpeech(request)
} catch (e: UnauthorizedException) {
    // 401: Invalid API key
    println("Invalid API key: ${e.message}")
} catch (e: PaymentRequiredException) {
    // 402: Insufficient credits
    println("Insufficient credits: ${e.message}")
} catch (e: ForbiddenException) {
    // 403: Access denied
    println("Access denied: ${e.message}")
} catch (e: NotFoundException) {
    // 404: Resource not found
    println("Voice not found: ${e.message}")
} catch (e: UnprocessableEntityException) {
    // 422: Validation error
    println("Validation error: ${e.message}")
} catch (e: RateLimitException) {
    // 429: Rate limit exceeded
    println("Rate limit exceeded - please retry later")
} catch (e: InternalServerException) {
    // 500: Server error
    println("Server error: ${e.message}")
} catch (e: TypecastException) {
    // Generic error
    println("API error (${e.statusCode}): ${e.message}")
}

Exception Hierarchy

Exception	Status Code	Description
`BadRequestException`	400	Invalid request parameters
`UnauthorizedException`	401	Invalid or missing API key
`PaymentRequiredException`	402	Insufficient credits
`ForbiddenException`	403	Access denied
`NotFoundException`	404	Resource not found
`UnprocessableEntityException`	422	Validation error
`RateLimitException`	429	Rate limit exceeded
`InternalServerException`	500	Server error
`TypecastException`	*	Base exception class

IntelliJ IDEA Setup

Create New Project

Open IntelliJ IDEA
Go to File → New → Project...
Select “Kotlin” and “Gradle (Kotlin)”
Set JDK to 17 or higher

Add Dependency

Add to your build.gradle.kts:

dependencies {
    implementation("com.neosapience:typecast-kotlin:1.2.7")
}

Sync Project

Click the Gradle sync button or right-click build.gradle.kts → Reload Gradle Project

Android Setup

Add Dependency

Add to your app’s build.gradle.kts:

dependencies {
    implementation("com.neosapience:typecast-kotlin:1.2.7")
}

Add Internet Permission

Add to your AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />

Use in Background Thread

Make API calls from a coroutine or background thread:

lifecycleScope.launch(Dispatchers.IO) {
    val client = TypecastClient.create("YOUR_API_KEY")
    val response = client.textToSpeech(request)
    // Handle response
}

API Reference

TypecastClient Methods

Method	Description
`textToSpeech(TTSRequest)`	Convert text to speech audio
`generateToFile(path, GenerateToFileRequest)`	Generate speech and save it directly to a local file
`cloneVoice(ByteArray, filename, name, model)`	Create a custom voice via instant cloning
`cloneVoice(File, name, model)`	Create a custom voice from a local audio file
`deleteVoice(voiceId: String)`	Delete a custom cloned voice
`getVoicesV2()`	Get all available voices
`getVoicesV2(VoicesV2Filter)`	Get filtered voices
`getVoiceV2(voiceId: String)`	Get a specific voice by ID
`close()`	Release resources

TTSRequest Fields

Field	Type	Required	Description
`voiceId`	`String`	✓	Voice ID (format: `tc_` or `uc_`)
`text`	`String`	✓	Text to synthesize (max 2000 chars)
`model`	`TTSModel`	✓	TTS model (`SSFM_V21` or `SSFM_V30`)
`language`	`LanguageCode`		ISO 639-3 code (auto-detected if omitted)
`prompt`	`Prompt` / `PresetPrompt` / `SmartPrompt`		Emotion settings
`output`	`Output`		Audio output settings
`seed`	`UInt32`		Unsigned integer seed for reproducibility (≥ 0)

TTSResponse Fields

Field	Type	Description
`audioData`	`ByteArray`	Generated audio data
`duration`	`Double`	Audio duration in seconds
`format`	`String`	Audio format (`wav` or `mp3`)

Maven Central

Source Code

​Installation

​Quick Start

​Features

​Voice Recommendations

​Configuration

​Environment File

​Advanced Usage

​Emotion Control (ssfm-v30)

​Audio Customization

​Generate audio to a file

​Text pauses

​Multi-speaker composition

​Voice Discovery (V2 API)

​Multilingual Content

​Streaming

​Timestamp TTS

​Basic Usage

​Granularity

​Subtitle Export

​Instant Voice Cloning

​Supported Languages

​Error Handling

​Exception Hierarchy

​IntelliJ IDEA Setup

​Android Setup

​API Reference

​TypecastClient Methods

​TTSRequest Fields

​TTSResponse Fields

Installation

Quick Start

Features

Voice Recommendations

Configuration

Environment File

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Generate audio to a file

Text pauses

Multi-speaker composition

Voice Discovery (V2 API)

Multilingual Content

Streaming

Timestamp TTS

Basic Usage

Granularity

Subtitle Export

Instant Voice Cloning

Supported Languages

Error Handling

Exception Hierarchy

IntelliJ IDEA Setup

Android Setup

API Reference

TypecastClient Methods

TTSRequest Fields

TTSResponse Fields