Use this file to discover all available pages before exploring further.
The official Java library for the Typecast API. Convert text to lifelike speech using AI-powered voices.Compatible with Java 8 and later versions. Works with Maven, Gradle, and manual installation.
Set your API key via environment variable, .env file, or constructor:
// Using environment variable// export TYPECAST_API_KEY="your-api-key-here"TypecastClient client = new TypecastClient();// Or pass directlyTypecastClient client = new TypecastClient("your-api-key-here");// Or with custom base URLTypecastClient client = new TypecastClient("your-api-key-here", "https://custom-api.example.com");
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
TTSRequest request = TTSRequest.builder() .voiceId("tc_672c5f5ce59fac2a48faeaee") .text("Everything is going to be okay.") .model(TTSModel.SSFM_V30) .prompt(SmartPrompt.builder() .previousText("I just got the best news!") // Optional context .nextText("I can't wait to celebrate!") // Optional context .build()) .build();TTSResponse response = client.textToSpeech(request);
Explicitly set emotion with preset values:
TTSRequest request = TTSRequest.builder() .voiceId("tc_672c5f5ce59fac2a48faeaee") .text("I am so excited to show you these features!") .model(TTSModel.SSFM_V30) .prompt(PresetPrompt.builder() .emotionPreset(EmotionPreset.HAPPY) // normal, happy, sad, angry, whisper, toneup, tonedown .emotionIntensity(1.5) // Range: 0.0 to 2.0 .build()) .build();TTSResponse response = client.textToSpeech(request);
Stream audio chunks in real-time for low-latency playback:
import javax.sound.sampled.*;// Set up audio playback: 32000 Hz, 16-bit, mono, little-endianAudioFormat format = new AudioFormat(32000, 16, 1, true, false);SourceDataLine line = AudioSystem.getSourceDataLine(format);line.open(format, 8192);line.start();try (InputStream stream = client.textToSpeechStream(request)) { byte[] buf = new byte[4096]; boolean first = true; int bytesRead; while ((bytesRead = stream.read(buf)) != -1) { int offset = 0; if (first) { offset = 44; // Skip 44-byte WAV header bytesRead -= 44; first = false; } line.write(buf, offset, bytesRead); }}line.drain();line.close();client.close();
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. For MP3: 320 kbps, 44100 Hz, each chunk is independently decodable. Use com.neosapience.models.OutputStream to avoid collision with java.io.OutputStream. The streaming endpoint does not support volume or targetLufs.
textToSpeechWithTimestamps() wraps POST /v1/text-to-speech/with-timestamps and returns the audio together with per-word and per-character alignment data — useful for karaoke highlights, subtitle generation, and lip-sync applications.
Pass .granularity(Granularity.WORD) (default) or .granularity(Granularity.CHAR) to control the alignment unit.
TTSRequestWithTimestamps request = TTSRequestWithTimestamps.builder() .voiceId("tc_60e5426de8b95f1d3000d7b5") .text("Hello. How are you?") .model(TTSModel.SSFM_V30) .granularity(Granularity.CHAR) // required for Japanese / Chinese .build();
Japanese / Chinese: Word-level segmentation is not meaningful for languages without whitespace delimiters (jpn, zho). Use Granularity.CHAR for these languages to get character-level alignment.