Access the Typecast API with our official Ruby SDK.
The official Ruby SDK for the Typecast API. Convert text to lifelike speech using AI-powered voices, generate timestamps, list voices, and create custom voices.The Ruby SDK uses only the Ruby standard library at runtime and supports Ruby 2.6+.
ssfm-v30 offers two emotion control modes: Preset and Smart.
Smart Mode
Preset Mode
Let the AI infer emotion from context:
response = client.text_to_speech( Typecast::Models::TTSRequest.new( voice_id: "tc_672c5f5ce59fac2a48faeaee", text: "Everything is going to be okay.", model: Typecast::Models::TTS_MODEL_V30, prompt: Typecast::Models::SmartPrompt.new( previous_text: "I just got the best news!", next_text: "I can't wait to celebrate!" ) ))
Explicitly set emotion with preset values:
response = client.text_to_speech( Typecast::Models::TTSRequest.new( voice_id: "tc_672c5f5ce59fac2a48faeaee", text: "I am so excited to show you these features!", model: Typecast::Models::TTS_MODEL_V30, prompt: Typecast::Models::PresetPrompt.new( emotion_preset: "happy", emotion_intensity: 1.5 ) ))
Use text_to_speech_stream() to call the streaming endpoint:
client.text_to_speech_stream( Typecast::Models::TTSRequestStream.new( voice_id: "tc_672c5f5ce59fac2a48faeaee", text: "Stream this text as audio.", model: Typecast::Models::TTS_MODEL_V30, output: Typecast::Models::OutputStream.new(audio_format: "wav") )) do |audio| File.binwrite("stream.wav", audio)end
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. The streaming endpoint does not support volume or target_lufs.
text_to_speech_with_timestamps() wraps POST /v1/text-to-speech/with-timestamps and returns audio together with word- or character-level alignment data.
result = client.text_to_speech_with_timestamps( Typecast::Models::TTSRequest.new( voice_id: "tc_60e5426de8b95f1d3000d7b5", text: "Hello. How are you?", model: Typecast::Models::TTS_MODEL_V30 ))result.save_audio("output.wav")puts "Duration: #{result.audio_duration}s"result.words.each do |word| puts "[#{word.start_time}s - #{word.end_time}s] #{word.word}"end