Ruby

The official Ruby SDK for the Typecast API. Convert text to lifelike speech using AI-powered voices, generate timestamps, list voices, and create custom voices. The Ruby SDK uses only the Ruby standard library at runtime and supports Ruby 2.6+.

RubyGems

Typecast Ruby SDK

Source Code

Typecast Ruby SDK Source Code

Installation

Install from RubyGems:

gem install typecast-ruby

Latest registered version: 0.1.6 on RubyGems.

Or add it to your Gemfile:

gem "typecast-ruby", "~> 0.1.6"

Requires Ruby 2.6 or higher. Check your version with ruby -v.

Quick Start

require "typecast"

client = Typecast::Client.new(api_key: ENV["TYPECAST_API_KEY"])

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Hello there! I'm your friendly text-to-speech agent.",
    model: Typecast::Models::TTS_MODEL_V30,
    language: "eng",
    output: Typecast::Models::Output.new(audio_format: "wav")
  )
)

File.binwrite("output.wav", response.audio_data)
puts "Duration: #{response.duration}s, Format: #{response.format}"

Features

Multiple Voice Models: Support for ssfm-v30 and ssfm-v21 AI voice models
Multi-language Support: 37 languages including English, Korean, Japanese, Chinese, Spanish, and more
Emotion Control: Preset emotions or smart context-aware inference
Audio Customization: Control loudness, pitch, tempo, and output format
Voice Discovery: V2 Voices API with filtering by model, gender, age, and use cases
Streaming Endpoint: Access streaming TTS responses from Ruby
Timestamp TTS: Word- and character-level alignment data with SRT/VTT helpers
Instant Voice Cloning: Upload a WAV sample and create a custom voice ID
No Runtime Dependencies: Built on Ruby standard library net/http

Voice Recommendations

Use recommend_voices when you know the desired style but not the exact voice_id.

voices = client.recommend_voices(
  "warm female voice for a product tutorial",
  count: 3
)

voices.each do |voice|
  puts "#{voice.voice_id} #{voice.voice_name} #{voice.score}"
end

Recommendation results contain only voice_id, voice_name, and score. Use get_voice_v2 or get_voices_v2 when you need detailed metadata such as supported models, emotions, gender, age, or use cases.

Configuration

Set your API key via environment variable or constructor:

export TYPECAST_API_KEY="your-api-key-here"

require "typecast"

client = Typecast::Client.new(
  api_key: ENV["TYPECAST_API_KEY"]
)

require "typecast"

client = Typecast::Client.new(
  api_key: "your-api-key-here"
)

When requests go through your own proxy, set base_url to the proxy endpoint and omit api_key. The SDK will not send the X-API-KEY header for empty or missing keys. Requests to the default Typecast host still require an API key.

Proxy without API key

client = Typecast::Client.new(
  base_url: "https://your-proxy.example.com"
)

You can also override the API host and HTTP timeouts:

client = Typecast::Client.new(
  api_key: ENV["TYPECAST_API_KEY"],
  base_url: "https://api.typecast.ai",
  open_timeout: 10,
  read_timeout: 30
)

Advanced Usage

Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: Preset and Smart.

Smart Mode
Preset Mode

Let the AI infer emotion from context:

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Everything is going to be okay.",
    model: Typecast::Models::TTS_MODEL_V30,
    prompt: Typecast::Models::SmartPrompt.new(
      previous_text: "I just got the best news!",
      next_text: "I can't wait to celebrate!"
    )
  )
)

Explicitly set emotion with preset values:

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "I am so excited to show you these features!",
    model: Typecast::Models::TTS_MODEL_V30,
    prompt: Typecast::Models::PresetPrompt.new(
      emotion_preset: "happy",
      emotion_intensity: 1.5
    )
  )
)

Audio Customization

Control loudness, pitch, tempo, and output format:

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Customized audio output!",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::Output.new(
      target_lufs: -14.0,
      audio_pitch: 2,
      audio_tempo: 1.2,
      audio_format: Typecast::Models::AUDIO_MP3
    ),
    seed: 42
  )
)

File.binwrite("output.mp3", response.audio_data)

Generate audio to a file

Use generate_to_file when you want the SDK to synthesize speech and write the audio bytes directly to a local file. The model defaults to ssfm-v30, and .mp3 / .wav extensions infer the output format when no output format is set. Browse available voice IDs on the Voices page.

client.generate_to_file(
  'output.mp3',
  text: 'Hello from Typecast.',
  voice_id: 'tc_672c5f5ce59fac2a48faeaee' # Find voice IDs at https://studio.typecast.ai/developers/api/voices
)

Text pauses

Use text pause markup when you only need silent gaps inside one composed text segment. Put <|5s|>, <|1s|>, <|0.3s|>, or <|0.34413s|> directly in the text. The value is interpreted as seconds and must end with s. This keeps the pause expression visible in plain text without adding separate pause calls.

audio = client
  .compose_speech
  .defaults(voice_id: "tc_672c5f5ce59fac2a48faeaee", model: "ssfm-v30")
  .say("Hello<|5s|>Nice to meet you<|1s|>Today<|2s|>how does the weather feel?")
  .generate

Multi-speaker composition

Use the composer chaining API when one output file needs different voices or per-segment options such as pitch, tempo, prompt, or seed. The composer generates each segment as WAV, trims leading/trailing silent PCM samples, and concatenates the result. If you need MP3, generate WAV first and convert it in your app or server pipeline.

audio = client
  .compose_speech
  .defaults(voice_id: "tc_672c5f5ce59fac2a48faeaee", model: Typecast::Models::TTS_MODEL_V30)
  .say("Hello there")
  .pause(5)
  .say("Nice to meet you", voice_id: "tc_60e5426de8b95f1d3000d7b5", output: { audio_pitch: 2 })
  .say("Today")
  .pause(2)
  .say("How does the weather feel?")
  .generate

File.binwrite("conversation.wav", audio.audio_data)

Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:

voices = client.get_voices_v2

filtered = client.get_voices_v2(
  Typecast::Models::VoicesV2Filter.new(
    model: Typecast::Models::TTS_MODEL_V30,
    gender: "female",
    age: "young_adult"
  )
)

voices.each do |voice|
  puts "ID: #{voice.voice_id}, Name: #{voice.voice_name}"
  puts "Gender: #{voice.gender}, Age: #{voice.age}"
end

voice = client.get_voice_v2("tc_672c5f5ce59fac2a48faeaee")
puts voice.voice_name

Streaming

Use text_to_speech_stream() to call the streaming endpoint:

client.text_to_speech_stream(
  Typecast::Models::TTSRequestStream.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Stream this text as audio.",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::OutputStream.new(audio_format: "wav")
  )
) do |audio|
  File.binwrite("stream.wav", audio)
end

WAV streaming format: 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = 0xFFFFFFFF); subsequent chunks are raw PCM only.

Timestamp TTS

text_to_speech_with_timestamps() wraps POST /v1/text-to-speech/with-timestamps and returns audio together with word- or character-level alignment data.

result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  )
)

result.save_audio("output.wav")
puts "Duration: #{result.audio_duration}s"

result.words.each do |word|
  puts "[#{word.start_time}s - #{word.end_time}s] #{word.word}"
end

Granularity

Pass granularity: "word" (default) or granularity: "char" to control the alignment unit.

result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  ),
  granularity: "char"
)

Subtitle Export

File.write("output.srt", result.to_srt)
File.write("output.vtt", result.to_vtt)

Instant Voice Cloning

Upload a short WAV sample to create a custom voice:

voice = client.clone_voice(
  audio: File.binread("sample.wav"),
  filename: "sample.wav",
  name: "My Voice",
  model: Typecast::Models::TTS_MODEL_V30
)

puts "Custom voice ID: #{voice.voice_id}"

Voice cloning audio must be 25 MB or smaller, and the custom voice name must be 1-30 characters.

GET STARTED

SDKs

INTEGRATIONS

RubyGems

Source Code

Installation

Quick Start

Features

Voice Recommendations

Configuration

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Generate audio to a file

Text pauses

Multi-speaker composition

Voice Discovery (V2 API)

Streaming

Timestamp TTS

Granularity

Subtitle Export

Instant Voice Cloning

RubyGems

Source Code

​Installation

​Quick Start

​Features

​Voice Recommendations

​Configuration

​Advanced Usage

​Emotion Control (ssfm-v30)

​Audio Customization

​Generate audio to a file

​Text pauses

​Multi-speaker composition

​Voice Discovery (V2 API)

​Streaming

​Timestamp TTS

​Granularity

​Subtitle Export

​Instant Voice Cloning

Installation

Quick Start

Features

Voice Recommendations

Configuration

Advanced Usage

Emotion Control (ssfm-v30)

Audio Customization

Generate audio to a file

Text pauses

Multi-speaker composition

Voice Discovery (V2 API)

Streaming

Timestamp TTS

Granularity

Subtitle Export

Instant Voice Cloning