> ## Documentation Index
> Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Ruby

> Access the Typecast API with our official Ruby SDK.

The official Ruby SDK for the [Typecast API](https://typecast.ai/developers/api). Convert text to lifelike speech using AI-powered voices, generate timestamps, list voices, and create custom voices.

The Ruby SDK uses only the Ruby standard library at runtime and supports Ruby 2.6+.

<CardGroup cols={2}>
  <Card title="RubyGems" icon="gem" href="https://rubygems.org/gems/typecast-ruby">
    Typecast Ruby SDK
  </Card>

  <Card title="Source Code" icon="github" href="https://github.com/neosapience/typecast-sdk/tree/main/typecast-ruby">
    Typecast Ruby SDK Source Code
  </Card>
</CardGroup>

## Installation

Install from RubyGems:

```bash theme={null}
gem install typecast-ruby
```

<Note>Latest registered version: **0.1.3** on RubyGems.</Note>

Or add it to your Gemfile:

```ruby theme={null}
gem "typecast-ruby", "~> 0.1.3"
```

<Warning>
  Requires **Ruby 2.6 or higher**. Check your version with `ruby -v`.
</Warning>

## Quick Start

```ruby theme={null}
require "typecast"

client = Typecast::Client.new(api_key: ENV["TYPECAST_API_KEY"])

response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Hello there! I'm your friendly text-to-speech agent.",
    model: Typecast::Models::TTS_MODEL_V30,
    language: "eng",
    output: Typecast::Models::Output.new(audio_format: "wav")
  )
)

File.binwrite("output.wav", response.audio_data)
puts "Duration: #{response.duration}s, Format: #{response.format}"
```

## Features

* **Multiple Voice Models**: Support for `ssfm-v30` and `ssfm-v21` AI voice models
* **Multi-language Support**: 37 languages including English, Korean, Japanese, Chinese, Spanish, and more
* **Emotion Control**: Preset emotions or smart context-aware inference
* **Audio Customization**: Control loudness, pitch, tempo, and output format
* **Voice Discovery**: V2 Voices API with filtering by model, gender, age, and use cases
* **Streaming Endpoint**: Access streaming TTS responses from Ruby
* **Timestamp TTS**: Word- and character-level alignment data with SRT/VTT helpers
* **Instant Voice Cloning**: Upload a WAV sample and create a custom voice ID
* **No Runtime Dependencies**: Built on Ruby standard library `net/http`

## Configuration

Set your API key via environment variable or constructor:

<CodeGroup>
  ```bash Environment Variable theme={null}
  export TYPECAST_API_KEY="your-api-key-here"
  ```

  ```ruby From Environment theme={null}
  require "typecast"

  client = Typecast::Client.new(
    api_key: ENV["TYPECAST_API_KEY"]
  )
  ```

  ```ruby Direct Configuration theme={null}
  require "typecast"

  client = Typecast::Client.new(
    api_key: "your-api-key-here"
  )
  ```
</CodeGroup>

<Info>
  When requests go through your own proxy, set `base_url` to the proxy endpoint and omit `api_key`. The SDK will not send the `X-API-KEY` header for empty or missing keys. Requests to the default Typecast host still require an API key.
</Info>

```ruby Proxy without API key theme={null}
client = Typecast::Client.new(
  base_url: "https://your-proxy.example.com"
)
```

You can also override the API host and HTTP timeouts:

```ruby theme={null}
client = Typecast::Client.new(
  api_key: ENV["TYPECAST_API_KEY"],
  base_url: "https://api.typecast.ai",
  open_timeout: 10,
  read_timeout: 30
)
```

## Advanced Usage

### Emotion Control (ssfm-v30)

ssfm-v30 offers two emotion control modes: **Preset** and **Smart**.

<Tabs>
  <Tab title="Smart Mode">
    Let the AI infer emotion from context:

    ```ruby theme={null}
    response = client.text_to_speech(
      Typecast::Models::TTSRequest.new(
        voice_id: "tc_672c5f5ce59fac2a48faeaee",
        text: "Everything is going to be okay.",
        model: Typecast::Models::TTS_MODEL_V30,
        prompt: Typecast::Models::SmartPrompt.new(
          previous_text: "I just got the best news!",
          next_text: "I can't wait to celebrate!"
        )
      )
    )
    ```
  </Tab>

  <Tab title="Preset Mode">
    Explicitly set emotion with preset values:

    ```ruby theme={null}
    response = client.text_to_speech(
      Typecast::Models::TTSRequest.new(
        voice_id: "tc_672c5f5ce59fac2a48faeaee",
        text: "I am so excited to show you these features!",
        model: Typecast::Models::TTS_MODEL_V30,
        prompt: Typecast::Models::PresetPrompt.new(
          emotion_preset: "happy",
          emotion_intensity: 1.5
        )
      )
    )
    ```
  </Tab>
</Tabs>

### Audio Customization

Control loudness, pitch, tempo, and output format:

```ruby theme={null}
response = client.text_to_speech(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Customized audio output!",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::Output.new(
      target_lufs: -14.0,
      audio_pitch: 2,
      audio_tempo: 1.2,
      audio_format: Typecast::Models::AUDIO_MP3
    ),
    seed: 42
  )
)

File.binwrite("output.mp3", response.audio_data)
```

### Generate audio to a file

Use `generate_to_file` when you want the SDK to synthesize speech and write the audio bytes directly to a local file. The model defaults to `ssfm-v30`, and `.mp3` / `.wav` extensions infer the output format when no output format is set. Browse available voice IDs on the [Voices](https://typecast.ai/developers/api/voices) page.

```ruby theme={null}
client.generate_to_file(
  'output.mp3',
  text: 'Hello from Typecast.',
  voice_id: 'tc_672c5f5ce59fac2a48faeaee' # Find voice IDs at https://typecast.ai/developers/api/voices
)
```

### Text pauses

Use text pause markup when you only need silent gaps inside one composed text segment. Put `<|5s|>`, `<|1s|>`, `<|0.3s|>`, or `<|0.34413s|>` directly in the text. The value is interpreted as seconds and must end with `s`. This keeps the pause expression visible in plain text without adding separate pause calls.

```ruby theme={null}
audio = client
  .compose_speech
  .defaults(voice_id: "tc_672c5f5ce59fac2a48faeaee", model: "ssfm-v30")
  .say("Hello<|5s|>Nice to meet you<|1s|>Today<|2s|>how does the weather feel?")
  .generate
```

### Multi-speaker composition

Use the composer chaining API when one output file needs different voices or per-segment options such as pitch, tempo, prompt, or seed. The composer generates each segment as WAV, trims leading/trailing silent PCM samples, and concatenates the result. If you need MP3, generate WAV first and convert it in your app or server pipeline.

```ruby theme={null}
audio = client
  .compose_speech
  .defaults(voice_id: "tc_672c5f5ce59fac2a48faeaee", model: Typecast::Models::TTS_MODEL_V30)
  .say("Hello there")
  .pause(5)
  .say("Nice to meet you", voice_id: "tc_60e5426de8b95f1d3000d7b5", output: { audio_pitch: 2 })
  .say("Today")
  .pause(2)
  .say("How does the weather feel?")
  .generate

File.binwrite("conversation.wav", audio.audio_data)
```

### Voice Discovery (V2 API)

List and filter available voices with enhanced metadata:

```ruby theme={null}
voices = client.get_voices_v2

filtered = client.get_voices_v2(
  Typecast::Models::VoicesV2Filter.new(
    model: Typecast::Models::TTS_MODEL_V30,
    gender: "female",
    age: "young_adult"
  )
)

voices.each do |voice|
  puts "ID: #{voice.voice_id}, Name: #{voice.voice_name}"
  puts "Gender: #{voice.gender}, Age: #{voice.age}"
end

voice = client.get_voice_v2("tc_672c5f5ce59fac2a48faeaee")
puts voice.voice_name
```

### Streaming

Use `text_to_speech_stream()` to call the streaming endpoint:

```ruby theme={null}
client.text_to_speech_stream(
  Typecast::Models::TTSRequestStream.new(
    voice_id: "tc_672c5f5ce59fac2a48faeaee",
    text: "Stream this text as audio.",
    model: Typecast::Models::TTS_MODEL_V30,
    output: Typecast::Models::OutputStream.new(audio_format: "wav")
  )
) do |audio|
  File.binwrite("stream.wav", audio)
end
```

<Note>
  **WAV streaming format:** 32000 Hz, 16-bit, mono PCM. The first chunk includes a 44-byte WAV header (size = `0xFFFFFFFF`); subsequent chunks are raw PCM only.
</Note>

## Timestamp TTS

`text_to_speech_with_timestamps()` wraps `POST /v1/text-to-speech/with-timestamps` and returns audio together with word- or character-level alignment data.

```ruby theme={null}
result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  )
)

result.save_audio("output.wav")
puts "Duration: #{result.audio_duration}s"

result.words.each do |word|
  puts "[#{word.start_time}s - #{word.end_time}s] #{word.word}"
end
```

### Granularity

Pass `granularity: "word"` (default) or `granularity: "char"` to control the alignment unit.

```ruby theme={null}
result = client.text_to_speech_with_timestamps(
  Typecast::Models::TTSRequest.new(
    voice_id: "tc_60e5426de8b95f1d3000d7b5",
    text: "Hello. How are you?",
    model: Typecast::Models::TTS_MODEL_V30
  ),
  granularity: "char"
)
```

### Subtitle Export

```ruby theme={null}
File.write("output.srt", result.to_srt)
File.write("output.vtt", result.to_vtt)
```

## Instant Voice Cloning

Upload a short WAV sample to create a custom voice:

```ruby theme={null}
voice = client.clone_voice(
  audio: File.binread("sample.wav"),
  filename: "sample.wav",
  name: "My Voice",
  model: Typecast::Models::TTS_MODEL_V30
)

puts "Custom voice ID: #{voice.voice_id}"
```

<Warning>
  Voice cloning audio must be **25 MB or smaller**, and the custom voice name must be **1-30 characters**.
</Warning>