Skip to main content
The official Ruby SDK for the Typecast API. Convert text to lifelike speech using AI-powered voices, generate timestamps, list voices, and create custom voices. The Ruby SDK uses only the Ruby standard library at runtime and supports Ruby 2.6+.

RubyGems

Typecast Ruby SDK

Source Code

Typecast Ruby SDK Source Code

Installation

Install from RubyGems:
gem install typecast-ruby
Or add it to your Gemfile:
gem "typecast-ruby", "~> 0.1.0"
Requires Ruby 2.6 or higher. Check your version with ruby -v.

Quick Start

require "typecast"

client = Typecast::Client.new(api_key: ENV["TYPECAST_API_KEY"])

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Hello there! I'm your friendly text-to-speech agent.",
    model: Typecast::Models::TTS_MODEL_V30,
    language: "eng",
    output: Typecast::Models::Output.new(audio_format: "wav")
  )
)

File.binwrite("output.wav", response.audio_data)
puts "Duration: #{response.duration}s, Format: #{response.format}"

Features

  • Multiple Voice Models: Support for ssfm-v30 and ssfm-v21 AI voice models
  • Multi-language Support: 37 languages including English, Korean, Japanese, Chinese, Spanish, and more
  • Emotion Control: Preset emotions or smart context-aware inference
  • Audio Customization: Control loudness, pitch, tempo, and output format
  • Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
  • Streaming Endpoint: Access streaming TTS responses from Ruby
  • Timestamp TTS: Word- and character-level alignment data with SRT/VTT helpers
  • Instant Voice Cloning: Upload a WAV sample and create a custom voice ID
  • No Runtime Dependencies: Built on Ruby standard library net/http

Configuration

Set your API key via environment variable or constructor:
export TYPECAST_API_KEY="your-api-key-here"
You can also override the API host and HTTP timeouts:
client = Typecast::Client.new(
  api_key: ENV["TYPECAST_API_KEY"],
  base_url: "https://api.typecast.ai",
  open_timeout: 10,
  read_timeout: 30
)

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.
Let the AI infer emotion from context:
response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Everything is going to be okay.",
    model: Typecast::Models::TTS_MODEL_V30,
    prompt: Typecast::Models::SmartPrompt.new(
      previous_text: "I just got the best news!",
      next_text: "I can't wait to celebrate!"
    )
  )
)

Audio Customization

Control loudness, pitch, tempo, and output format:
response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Customized audio output!",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::Output.new(
      target_lufs: -14.0,
      audio_pitch: 2,
      audio_tempo: 1.2,
      audio_format: Typecast::Models::AUDIO_MP3
    ),
    seed: 42
  )
)

File.binwrite("output.mp3", response.audio_data)

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:
voices = client.get_voices_v2

filtered = client.get_voices_v2(
  Typecast::Models::VoicesV2Filter.new(
    model: Typecast::Models::TTS_MODEL_V30,
    gender: "female",
    age: "young_adult"
  )
)

voices.each do |voice|
  puts "ID: #{voice.voice_id}, Name: #{voice.voice_name}"
  puts "Gender: #{voice.gender}, Age: #{voice.age}"
end

voice = client.get_voice_v2("tc_672c5f5ce59fac2a48faeaee")
puts voice.voice_name

Streaming

Use text_to_speech_stream() to call the streaming endpoint:
client.text_to_speech_stream(
  Typecast::Models::TTSRequestStream.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Stream this text as audio.",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::OutputStream.new(audio_format: "wav")
  )
) do |audio|
  File.binwrite("stream.wav", audio)
end
WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only. The streaming endpoint does not support volume or target_lufs.

Timestamp TTS

text_to_speech_with_timestamps() wraps POST /v1/text-to-speech/with-timestamps and returns audio together with word- or character-level alignment data.
result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  )
)

result.save_audio("output.wav")
puts "Duration: #{result.audio_duration}s"

result.words.each do |word|
  puts "[#{word.start_time}s - #{word.end_time}s] #{word.word}"
end

Granularity

Pass granularity: "word" (default) or granularity: "char" to control the alignment unit.
result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  ),
  granularity: "char"
)

Subtitle Export

File.write("output.srt", result.to_srt)
File.write("output.vtt", result.to_vtt)

Instant Voice Cloning

Upload a short WAV sample to create a custom voice:
voice = client.clone_voice(
  audio: File.binread("sample.wav"),
  filename: "sample.wav",
  name: "My Voice",
  model: Typecast::Models::TTS_MODEL_V30
)

puts "Custom voice ID: #{voice.voice_id}"
Voice cloning audio must be 25 MB or smaller, and the custom voice name must be 1-30 characters.