> ## Documentation Index
> Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# 스트리밍 텍스트 음성 변환(Streaming TTS)

> 실시간 스트리밍을 사용하여 텍스트에서 음성을 생성합니다. 전체 합성이 완료되기 전에 오디오 재생을 시작할 수 있습니다.

이 엔드포인트는 오디오 데이터를 청크 단위로 스트리밍하여 즉각적인 피드백이 필요한 애플리케이션에서 낮은 지연 시간의 오디오 재생을 가능하게 합니다.

**스트리밍 형식:**
- **WAV 형식**: 첫 번째 청크에는 WAV 헤더(스트리밍용 size=0xFFFFFFFF)와 원시 PCM 데이터가 포함됩니다. 이후 청크에는 PCM 데이터만 포함됩니다.
- **MP3 형식**: 각 청크에는 독립적으로 디코딩할 수 있는 후처리된 MP3 데이터가 포함됩니다.

**사용 사례:**
- 대화형 AI, 챗봇, 실시간 음성 비서 등
- 즉각적인 오디오 피드백이 필요한 인터랙티브 애플리케이션
- 전체 합성을 기다리는 것이 비실용적인 장문 콘텐츠

**요청 파라미터:**
표준 TTS 엔드포인트와 동일한 TTSRequest 스키마를 사용합니다. `output.audio_format`을 "wav" 또는 "mp3"로 설정하여 스트리밍 형식을 제어합니다.
```


## OpenAPI

````yaml /ko/api-reference/openapi.json post /v1/text-to-speech/stream
openapi: 3.1.0
info:
  title: Typecast API
  version: 0.1.2
  x-logo:
    url: https://typecast.ai/_ipx/_/image/logo/tc_logo.webp
servers:
  - url: https://api.typecast.ai
    description: 프로덕션 서버
security:
  - ApiKeyAuth: []
paths:
  /v1/text-to-speech/stream:
    post:
      tags:
        - Text-to-Speech
      summary: 스트리밍 텍스트 음성 변환(Streaming TTS)
      description: >-
        실시간 스트리밍을 사용하여 텍스트에서 음성을 생성합니다. 전체 합성이 완료되기 전에 오디오 재생을 시작할 수 있습니다.


        이 엔드포인트는 오디오 데이터를 청크 단위로 스트리밍하여 즉각적인 피드백이 필요한 애플리케이션에서 낮은 지연 시간의 오디오 재생을
        가능하게 합니다.


        **스트리밍 형식:**

        - **WAV 형식**: 첫 번째 청크에는 WAV 헤더(스트리밍용 size=0xFFFFFFFF)와 원시 PCM 데이터가
        포함됩니다. 이후 청크에는 PCM 데이터만 포함됩니다.

        - **MP3 형식**: 각 청크에는 독립적으로 디코딩할 수 있는 후처리된 MP3 데이터가 포함됩니다.


        **사용 사례:**

        - 대화형 AI, 챗봇, 실시간 음성 비서 등

        - 즉각적인 오디오 피드백이 필요한 인터랙티브 애플리케이션

        - 전체 합성을 기다리는 것이 비실용적인 장문 콘텐츠


        **요청 파라미터:**

        표준 TTS 엔드포인트와 동일한 TTSRequest 스키마를 사용합니다. `output.audio_format`을 "wav" 또는
        "mp3"로 설정하여 스트리밍 형식을 제어합니다.

        ```
      operationId: text_to_speech_stream_v1_text_to_speech_stream_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TTSRequestStream'
        required: true
      responses:
        '200':
          description: Success - Returns streaming audio data in chunks
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: >-
                  청크 단위 WAV 오디오 스트림(16비트, 모노, 32000 Hz). 첫 번째 청크에는 size
                  0xFFFFFFFF(스트리밍 표시)의 WAV 헤더와 원시 PCM 데이터가 포함됩니다. 이후 청크에는 PCM
                  데이터만 포함됩니다.
              example: '[Binary audio stream - WAV chunks]'
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: >-
                  청크 단위 MP3 오디오 스트림. 각 청크에는 독립적으로 디코딩 및 재생할 수 있는 유효한 MP3 프레임이
                  포함됩니다.
              example: '[Binary audio stream - MP3 chunks]'
        '400':
          description: Bad Request - Invalid parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid voice_id
        '401':
          description: Unauthorized - Authentication failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid API key
        '402':
          description: Payment Required - Insufficient credits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Insufficient credit
        '404':
          description: Not Found - Voice model not available
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Voice not found
        '422':
          description: Validation Error - Request validation failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid request format
        '429':
          description: Too Many Requests - Rate limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Too many requests
        '500':
          description: Internal Server Error - Server processing failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: An unexpected error occurred
      x-codeSamples:
        - lang: cURL
          label: cURL (스트리밍 + 재생)
          source: |
            # 스트리밍 오디오를 ffplay로 파이핑하여 실시간 재생.
            # 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            curl -N -s --request POST \
              --url https://api.typecast.ai/v1/text-to-speech/stream \
              --header 'Content-Type: application/json' \
              --header 'X-API-KEY: <api-key>' \
              --data @- <<EOF | ffplay -autoexit -nodisp -loglevel error -i pipe:0
            {
              "voice_id": "tc_60e5426de8b95f1d3000d7b5",
              "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
              "model": "ssfm-v30"
            }
            EOF
        - lang: Python
          label: Python (requests + sounddevice)
          source: >
            # sounddevice로 실시간 재생 (pip install requests sounddevice).

            # 스트리밍 WAV 형식: 32000 Hz, 16비트, 모노 — 44바이트 WAV 헤더를

            # 건너뛰고 원시 PCM 샘플을 오디오 출력에 전달합니다.

            import requests

            import sounddevice as sd


            API_HOST = "https://api.typecast.ai"

            headers = {"X-API-KEY": "<api-key>", "Content-Type":
            "application/json"}

            payload = {
                "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                "model": "ssfm-v30",
            }


            resp = requests.post(
                f"{API_HOST}/v1/text-to-speech/stream",
                headers=headers, json=payload, stream=True, timeout=60,
            )

            resp.raise_for_status()


            with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16")
            as player:
                buf, first = bytearray(), True
                for chunk in resp.iter_content(chunk_size=4096):
                    if not chunk:
                        continue
                    if first:
                        chunk = chunk[44:]  # WAV 헤더 제거
                        first = False
                    buf.extend(chunk)
                    # int16 샘플 정렬을 위해 2바이트 단위로만 write.
                    n = len(buf) - (len(buf) % 2)
                    if n:
                        player.write(bytes(buf[:n]))
                        del buf[:n]

            print("재생 완료")
        - lang: C#
          label: C# (HttpClient + ffplay)
          source: >
            // 스트림을 ffplay로 파이핑하여 실시간 재생.

            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)

            using System;

            using System.Diagnostics;

            using System.Net.Http;

            using System.Text;

            using System.Threading.Tasks;


            var client = new HttpClient();

            client.DefaultRequestHeaders.Add("X-API-KEY", "<api-key>");


            var requestBody = @"{
              ""voice_id"": ""tc_60e5426de8b95f1d3000d7b5"",
              ""text"": ""문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다."",
              ""model"": ""ssfm-v30""
            }";


            var ffplay = new Process

            {
                StartInfo = new ProcessStartInfo
                {
                    FileName = "ffplay",
                    Arguments = "-autoexit -nodisp -loglevel error -i pipe:0",
                    RedirectStandardInput = true,
                    UseShellExecute = false,
                }
            };

            ffplay.Start();


            var request = new HttpRequestMessage(HttpMethod.Post,
            "https://api.typecast.ai/v1/text-to-speech/stream")

            {
                Content = new StringContent(requestBody, Encoding.UTF8, "application/json")
            };


            // ResponseHeadersRead로 실제 스트리밍을 활성화합니다 (전체 버퍼링 회피).

            using var response = await client.SendAsync(request,
            HttpCompletionOption.ResponseHeadersRead);

            response.EnsureSuccessStatusCode();

            using var stream = await response.Content.ReadAsStreamAsync();

            await stream.CopyToAsync(ffplay.StandardInput.BaseStream);

            ffplay.StandardInput.Close();

            await ffplay.WaitForExitAsync();
        - lang: Kotlin
          label: Kotlin (OkHttp + ffplay)
          source: |
            // OkHttp 응답 스트림을 ffplay로 파이핑하여 실시간 재생.
            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            // Android에서는 ffplay Process 대신 AudioTrack + 원시 PCM 전달을 사용하세요.
            import okhttp3.MediaType.Companion.toMediaType
            import okhttp3.OkHttpClient
            import okhttp3.Request
            import okhttp3.RequestBody.Companion.toRequestBody

            val ffplay = ProcessBuilder(
                "ffplay", "-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0"
            ).redirectError(ProcessBuilder.Redirect.DISCARD).start()

            val client = OkHttpClient()
            val body = """
            {
              "voice_id": "tc_60e5426de8b95f1d3000d7b5",
              "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
              "model": "ssfm-v30"
            }
            """.trimIndent().toRequestBody("application/json".toMediaType())

            val request = Request.Builder()
                .url("https://api.typecast.ai/v1/text-to-speech/stream")
                .addHeader("X-API-KEY", "<api-key>")
                .post(body)
                .build()

            client.newCall(request).execute().use { response ->
                response.body?.byteStream()?.use { input -> input.copyTo(ffplay.outputStream) }
            }
            ffplay.outputStream.close()
            ffplay.waitFor()
        - lang: C++
          label: C++ (libcurl + ffplay)
          source: |
            // 실시간 재생: libcurl write 콜백이 각 청크를 popen으로 연
            // ffplay 프로세스에 전달. 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            #include <curl/curl.h>
            #include <cstdio>
            #include <string>

            static FILE* player = nullptr;

            size_t cb(void* ptr, size_t size, size_t nmemb, void*) {
                return fwrite(ptr, size, nmemb, player);
            }

            int main() {
                player = popen("ffplay -autoexit -nodisp -loglevel error -i pipe:0", "w");

                CURL* curl = curl_easy_init();
                struct curl_slist* headers = nullptr;
                headers = curl_slist_append(headers, "Content-Type: application/json");
                headers = curl_slist_append(headers, "X-API-KEY: <api-key>");

                std::string body = R"({
                    "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                    "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                    "model": "ssfm-v30"
                })";

                curl_easy_setopt(curl, CURLOPT_URL, "https://api.typecast.ai/v1/text-to-speech/stream");
                curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
                curl_easy_setopt(curl, CURLOPT_POSTFIELDS, body.c_str());
                curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, cb);

                curl_easy_perform(curl);

                curl_slist_free_all(headers);
                curl_easy_cleanup(curl);
                pclose(player);
                return 0;
            }
        - lang: C
          label: C (libcurl + ffplay)
          source: |
            /* 실시간 재생: libcurl write 콜백이 각 청크를 popen으로 연
             * ffplay 프로세스에 전달. 사전 설치: ffmpeg (brew/choco/apt install ffmpeg) */
            #include <stdio.h>
            #include <curl/curl.h>

            static FILE* player = NULL;

            size_t cb(void* ptr, size_t size, size_t nmemb, void* ud) {
                (void)ud;
                return fwrite(ptr, size, nmemb, player);
            }

            int main(void) {
                player = popen("ffplay -autoexit -nodisp -loglevel error -i pipe:0", "w");

                curl_global_init(CURL_GLOBAL_ALL);
                CURL* curl = curl_easy_init();

                struct curl_slist* headers = NULL;
                headers = curl_slist_append(headers, "Content-Type: application/json");
                headers = curl_slist_append(headers, "X-API-KEY: <api-key>");

                const char* body =
                    "{"
                    "\"voice_id\":\"tc_60e5426de8b95f1d3000d7b5\","
                    "\"text\":\"문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.\","
                    "\"model\":\"ssfm-v30\""
                    "}";

                curl_easy_setopt(curl, CURLOPT_URL, "https://api.typecast.ai/v1/text-to-speech/stream");
                curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
                curl_easy_setopt(curl, CURLOPT_POSTFIELDS, body);
                curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, cb);

                curl_easy_perform(curl);

                curl_slist_free_all(headers);
                curl_easy_cleanup(curl);
                curl_global_cleanup();
                pclose(player);
                return 0;
            }
        - lang: Swift
          label: Swift (URLSession + ffplay)
          source: |
            // 실시간 재생(macOS): URLSession 바이트 스트림을 Process로 연
            // ffplay에 파이핑. 사전 설치: ffmpeg (brew install ffmpeg).
            // URLSession.bytes(for:)는 iOS 15 / macOS 12 이상 필요.
            // 컴파일: swiftc -parse-as-library main.swift -o streaming_tts
            // iOS에서는 Process/ffplay 대신 AVAudioEngine + 스케줄드 PCM 버퍼 사용.
            import Foundation

            @main
            struct StreamingTTS {
                static func main() async throws {
                    let ffplay = Process()
                    ffplay.executableURL = URL(fileURLWithPath: "/usr/bin/env")
                    ffplay.arguments = ["ffplay", "-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0"]
                    let pipe = Pipe()
                    ffplay.standardInput = pipe
                    try ffplay.run()

                    var request = URLRequest(url: URL(string: "https://api.typecast.ai/v1/text-to-speech/stream")!)
                    request.httpMethod = "POST"
                    request.setValue("application/json", forHTTPHeaderField: "Content-Type")
                    request.setValue("<api-key>", forHTTPHeaderField: "X-API-KEY")
                    request.httpBody = try JSONSerialization.data(withJSONObject: [
                        "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                        "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                        "model": "ssfm-v30",
                    ])

                    let (bytes, _) = try await URLSession.shared.bytes(for: request)
                    var buffer = Data()
                    buffer.reserveCapacity(4096)
                    for try await byte in bytes {
                        buffer.append(byte)
                        if buffer.count >= 4096 {
                            try pipe.fileHandleForWriting.write(contentsOf: buffer)
                            buffer.removeAll(keepingCapacity: true)
                        }
                    }
                    if !buffer.isEmpty {
                        try pipe.fileHandleForWriting.write(contentsOf: buffer)
                    }
                    try pipe.fileHandleForWriting.close()
                    ffplay.waitUntilExit()
                }
            }
        - lang: Rust
          label: Rust (reqwest + ffplay)
          source: |
            // 실시간 재생: reqwest 스트림을 tokio Command로 연 ffplay에 파이핑.
            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            // Cargo.toml:
            //   reqwest = { version = "0.12", features = ["json", "stream"] }
            //   tokio   = { version = "1", features = ["full"] }
            //   serde_json = "1"
            use reqwest;
            use serde_json::json;
            use std::process::Stdio;
            use tokio::io::AsyncWriteExt;
            use tokio::process::Command;

            #[tokio::main]
            async fn main() -> Result<(), Box<dyn std::error::Error>> {
                let mut ffplay = Command::new("ffplay")
                    .args(["-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0"])
                    .stdin(Stdio::piped())
                    .spawn()?;
                let mut stdin = ffplay.stdin.take().expect("failed to open ffplay stdin");

                let client = reqwest::Client::new();
                let body = json!({
                    "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                    "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                    "model": "ssfm-v30"
                });

                let mut response = client
                    .post("https://api.typecast.ai/v1/text-to-speech/stream")
                    .header("X-API-KEY", "<api-key>")
                    .header("Content-Type", "application/json")
                    .json(&body)
                    .send()
                    .await?;

                while let Some(chunk) = response.chunk().await? {
                    stdin.write_all(&chunk).await?;
                }
                drop(stdin);
                ffplay.wait().await?;
                Ok(())
            }
        - lang: JavaScript
          label: JavaScript (Node.js + ffplay)
          source: >
            // Node 18+ (내장 fetch). 스트림을 ffplay로 파이핑하여 실시간 재생.

            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)

            import { spawn } from "node:child_process";


            const ffplay = spawn(
                "ffplay",
                ["-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0"],
                { stdio: ["pipe", "ignore", "ignore"] },
            );


            const response = await
            fetch("https://api.typecast.ai/v1/text-to-speech/stream", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                    "X-API-KEY": "<api-key>",
                },
                body: JSON.stringify({
                    voice_id: "tc_60e5426de8b95f1d3000d7b5",
                    text: "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                    model: "ssfm-v30",
                }),
            });

            if (!response.ok) throw new Error(`HTTP ${response.status}`);


            // fetch().body는 Web ReadableStream — 청크가 도착하는 즉시 읽습니다.

            const reader = response.body.getReader();

            while (true) {
                const { value, done } = await reader.read();
                if (done) break;
                ffplay.stdin.write(value);
            }

            ffplay.stdin.end();

            await new Promise((resolve) => ffplay.on("close", resolve));
        - lang: PHP
          label: PHP (curl + ffplay)
          source: >
            <?php

            // libcurl write 콜백을 ffplay stdin으로 직접 파이핑.

            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)

            $ffplay = popen("ffplay -autoexit -nodisp -loglevel error -i
            pipe:0", "w");


            $payload = json_encode([
                "voice_id" => "tc_60e5426de8b95f1d3000d7b5",
                "text" => "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                "model" => "ssfm-v30",
            ]);


            $ch = curl_init("https://api.typecast.ai/v1/text-to-speech/stream");

            curl_setopt_array($ch, [
                CURLOPT_POST => true,
                CURLOPT_HTTPHEADER => [
                    "Content-Type: application/json",
                    "X-API-KEY: <api-key>",
                ],
                CURLOPT_POSTFIELDS => $payload,
                CURLOPT_WRITEFUNCTION => function ($ch, $data) use ($ffplay) {
                    fwrite($ffplay, $data);
                    return strlen($data);
                },
            ]);

            curl_exec($ch);

            pclose($ffplay);
        - lang: Go
          label: Go (net/http + ffplay)
          source: |
            // 스트리밍 응답 본문을 ffplay stdin으로 파이핑.
            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            package main

            import (
                "bytes"
                "io"
                "net/http"
                "os/exec"
            )

            func main() {
                ffplay := exec.Command("ffplay", "-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0")
                stdin, _ := ffplay.StdinPipe()
                if err := ffplay.Start(); err != nil {
                    panic(err)
                }

                body := []byte(`{
                    "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                    "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                    "model": "ssfm-v30"
                }`)

                req, _ := http.NewRequest("POST", "https://api.typecast.ai/v1/text-to-speech/stream", bytes.NewReader(body))
                req.Header.Set("Content-Type", "application/json")
                req.Header.Set("X-API-KEY", "<api-key>")

                resp, err := http.DefaultClient.Do(req)
                if err != nil {
                    panic(err)
                }
                defer resp.Body.Close()

                io.Copy(stdin, resp.Body)
                stdin.Close()
                ffplay.Wait()
            }
        - lang: Java
          label: Java (HttpClient + ffplay)
          source: |
            // Java 11+ HttpClient + InputStream body handler.
            // 스트리밍 응답을 ffplay stdin으로 파이핑.
            // 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            import java.net.URI;
            import java.net.http.HttpClient;
            import java.net.http.HttpRequest;
            import java.net.http.HttpResponse;
            import java.io.InputStream;
            import java.io.OutputStream;

            public class StreamingTTS {
                public static void main(String[] args) throws Exception {
                    Process ffplay = new ProcessBuilder(
                            "ffplay", "-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0")
                            .redirectError(ProcessBuilder.Redirect.DISCARD)
                            .start();

                    String body = """
                        {
                          "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                          "text": "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
                          "model": "ssfm-v30"
                        }
                        """;

                    HttpRequest request = HttpRequest.newBuilder()
                            .uri(URI.create("https://api.typecast.ai/v1/text-to-speech/stream"))
                            .header("Content-Type", "application/json")
                            .header("X-API-KEY", "<api-key>")
                            .POST(HttpRequest.BodyPublishers.ofString(body))
                            .build();

                    HttpResponse<InputStream> response = HttpClient.newHttpClient()
                            .send(request, HttpResponse.BodyHandlers.ofInputStream());

                    try (InputStream in = response.body();
                         OutputStream out = ffplay.getOutputStream()) {
                        in.transferTo(out);
                    }
                    ffplay.waitFor();
                }
            }
        - lang: Ruby
          label: Ruby (net/http + ffplay)
          source: |
            # IO.popen으로 ffplay를 띄우고 스트리밍 응답을 파이핑.
            # 사전 설치: ffmpeg (brew/choco/apt install ffmpeg)
            require "net/http"
            require "uri"
            require "json"

            ffplay = IO.popen(
              ["ffplay", "-autoexit", "-nodisp", "-loglevel", "error", "-i", "pipe:0"],
              "wb",
            )

            uri = URI("https://api.typecast.ai/v1/text-to-speech/stream")
            http = Net::HTTP.new(uri.host, uri.port)
            http.use_ssl = true

            req = Net::HTTP::Post.new(uri)
            req["Content-Type"] = "application/json"
            req["X-API-KEY"] = "<api-key>"
            req.body = {
              voice_id: "tc_60e5426de8b95f1d3000d7b5",
              text: "문의해 주셔서 감사합니다. 금요일 오후 7시로 예약이 확정되었습니다.",
              model: "ssfm-v30",
            }.to_json

            http.request(req) do |response|
              response.read_body { |chunk| ffplay.write(chunk) }
            end

            ffplay.close
components:
  schemas:
    TTSRequestStream:
      type: object
      properties:
        voice_id:
          type: string
          title: Voice Id
          description: >-
            캐릭터 식별자. 두 가지 prefix 를 지원합니다.


            - `tc_` — 기본 제공되는 타입캐스트 캐릭터 (예: `tc_60e5426de8b95f1d3000d7b5`). 사용
            가능한 ID 는 [캐릭터 목록 조회](/docs/ko/api-reference/voices/list-voices) 를
            참조하세요.

            - `uc_` — [퀵 클로닝](/docs/ko/api-reference/voices/instant-cloning) 으로
            생성한 커스텀 보이스 (예: `uc_64a1b2c3d4e5f6a7b8c9d0e1`). 본인이 소유한 클로닝 보이스만 사용할
            수 있습니다.


            대소문자 구분: prefix 는 소문자만 사용합니다.
          example: tc_60e5426de8b95f1d3000d7b5
        text:
          type: string
          title: Text
          description: >-
            음성으로 변환할 텍스트. 최소 1자, 최대 2000자. 텍스트 길이에 따라 크레딧이 소비됩니다. 영어, 한국어, 일본어,
            중국어를 포함한 여러 언어를 지원합니다. 특수 문자와 구두점은 자동으로 처리됩니다.
          example: 모든 것이 너무나 완벽해서 마치 꿈을 꾸는 것 같습니다.
          minLength: 1
          maxLength: 2000
        model:
          $ref: '#/components/schemas/TTSModel'
          description: |
            음성 합성에 사용할 캐릭터 모델.

            - **ssfm-v30**: 향상된 플로우와 추가 감정 프리셋이 있는 최신 모델(권장)
            - **ssfm-v21**: 빠르고 안정적인 모델로 신뢰할 수 있는 품질 제공
          example: ssfm-v30
        language:
          type: string
          title: Language
          description: >
            ISO 639-3 표준을 따르는 언어 코드. 대소문자 구분 안 함("KOR"과 "kor" 모두 허용). 제공하지 않으면
            텍스트 내용을 기반으로 자동 감지됩니다.


            <details>

            <summary><strong>ssfm-v30 지원 언어 (37개)</strong></summary>


            | 코드 | 언어 | 코드 | 언어 | 코드 | 언어 |

            |------|----------|------|----------|------|----------|

            | ARA | 아랍어 | IND | 인도네시아어 | POR | 포르투갈어 |

            | BEN | 벵골어 | ITA | 이탈리아어 | RON | 루마니아어 |

            | BUL | 불가리아어 | JPN | 일본어 | RUS | 러시아어 |

            | CES | 체코어 | KOR | 한국어 | SLK | 슬로바키아어 |

            | DAN | 덴마크어 | MSA | 말레이어 | SPA | 스페인어 |

            | DEU | 독일어 | NAN | 민남어 | SWE | 스웨덴어 |

            | ELL | 그리스어 | NLD | 네덜란드어 | TAM | 타밀어 |

            | ENG | 영어 | NOR | 노르웨이어 | TGL | 타갈로그어 |

            | FIN | 핀란드어 | PAN | 펀자브어 | THA | 태국어 |

            | FRA | 프랑스어 | POL | 폴란드어 | TUR | 터키어 |

            | HIN | 힌디어 | UKR | 우크라이나어 | VIE | 베트남어 |

            | HRV | 크로아티아어 | YUE | 광둥어 | ZHO | 중국어 |

            | HUN | 헝가리어 | | | | |


            </details>


            <details>

            <summary><strong>ssfm-v21 지원 언어 (27개)</strong></summary>


            | 코드 | 언어 | 코드 | 언어 | 코드 | 언어 |

            |------|----------|------|----------|------|----------|

            | ARA | 아랍어 | IND | 인도네시아어 | RON | 루마니아어 |

            | BUL | 불가리아어 | ITA | 이탈리아어 | RUS | 러시아어 |

            | CES | 체코어 | JPN | 일본어 | SLK | 슬로바키아어 |

            | DAN | 덴마크어 | KOR | 한국어 | SPA | 스페인어 |

            | DEU | 독일어 | MSA | 말레이어 | SWE | 스웨덴어 |

            | ELL | 그리스어 | NLD | 네덜란드어 | TAM | 타밀어 |

            | ENG | 영어 | POL | 폴란드어 | TGL | 타갈로그어 |

            | FIN | 핀란드어 | POR | 포르투갈어 | UKR | 우크라이나어 |

            | FRA | 프랑스어 | HRV | 크로아티아어 | ZHO | 중국어 |


            </details>
          example: kor
        prompt:
          title: Prompt
          description: >-
            생성된 음성의 감정 및 스타일 설정, 감정 유형(happy/sad/angry/normal) 및 강도(0.0~2.0)를
            포함하여 감정 표현을 제어합니다
          oneOf:
            - $ref: '#/components/schemas/SmartPrompt'
            - $ref: '#/components/schemas/PresetPrompt'
            - $ref: '#/components/schemas/Prompt'
          discriminator:
            propertyName: emotion_type
            mapping:
              preset:
                $ref: '#/components/schemas/PresetPrompt'
              smart:
                $ref: '#/components/schemas/SmartPrompt'
        output:
          $ref: '#/components/schemas/OutputStream'
          description: >-
            피치(-12 ~ +12 반음), 속도(0.5x ~ 2.0x), 형식(wav/mp3), target_lufs(-70 ~ 0
            LUFS) 등 스트리밍 오디오 출력 설정. 참고: 스트리밍 모드에서는 volume을 사용할 수 없습니다.
        seed:
          type: integer
          minimum: 0
          title: Seed
          description: >-
            재현 가능한 음성 생성을 위한 부호 없는 정수 시드. 동일한 시드와 동일한 입력 파라미터로 항상 같은 오디오 결과를
            생성합니다.


            - 0 이상의 정수만 허용됩니다. 음수 값은 사용할 수 없습니다.

            - 생략하면 서버가 매번 랜덤 시드를 생성하여 약간의 변이가 발생합니다.
          example: 42
          anyOf:
            - type: integer
              maximum: 4294967295
              minimum: 0
            - type: 'null'
          format: uint32
      required:
        - voice_id
        - text
        - model
      title: TTSRequestStream
      description: 스트리밍 텍스트 음성 변환 요청 파라미터
    ErrorResponse:
      type: object
      properties:
        detail:
          type: string
          description: 문제를 설명하는 오류 메시지
      required:
        - detail
      example:
        detail: An error occurred processing the request
    TTSModel:
      type: string
      enum:
        - ssfm-v30
        - ssfm-v21
      title: TTSModel
      description: |
        음성 합성에 사용할 TTS 모델 버전. 다양한 모델은 다양한 기능과 품질 수준을 제공합니다.

        사용 가능한 모델:
        - **ssfm-v30**: 향상된 플로우와 추가 감정 프리셋이 있는 최신 모델(권장)
        - **ssfm-v21**: 검증된 신뢰성과 일관된 품질을 갖춘 안정적인 모델
    SmartPrompt:
      type: object
      properties:
        emotion_type:
          type: string
          title: Emotion Type
          description: |
            프롬프트 유형을 식별하는 판별자 필드. 컨텍스트 인식 감정 추론을 위해 "smart"로 설정해야 합니다.
          default: smart
          const: smart
        previous_text:
          type: string
          title: Previous Text
          description: |
            TTSRequest의 `text` 필드 이전에 오는 텍스트. 감정 추론을 위한 후방 컨텍스트를 제공합니다.

            모델은 흐름을 분석합니다: `previous_text` → `text`(합성됨) → `next_text`

            - 최대 2000자
            - 모델이 감정 빌드업과 컨텍스트를 이해하는 데 도움
            - 이전 컨텍스트가 없으면 비워 둡니다
          default: ''
          example: I feel like I'm walking on air and I just want to scream with joy!
        next_text:
          type: string
          title: Next Text
          description: |
            TTSRequest의 `text` 필드 이후에 오는 텍스트. 감정 추론을 위한 전방 컨텍스트를 제공합니다.

            모델은 흐름을 분석합니다: `previous_text` → `text`(합성됨) → `next_text`

            - 최대 2000자
            - 모델이 감정 전환을 예측하는 데 도움
            - 다음 컨텍스트가 없으면 비워 둡니다
          default: ''
          example: >-
            I am literally bursting with happiness and I never want this feeling
            to end!
      title: 스마트 프롬프트 (ssfm-v30)
      description: 생성된 음성의 감정 및 스타일 설정.
      example:
        emotion_type: smart
        previous_text: I feel like I'm walking on air and I just want to scream with joy!
        next_text: >-
          I am literally bursting with happiness and I never want this feeling
          to end!
      additionalProperties: false
    PresetPrompt:
      type: object
      properties:
        emotion_type:
          type: string
          title: Emotion Type
          description: |
            프롬프트 유형을 식별하는 판별자 필드. 프리셋 기반 감정 제어를 위해 "preset"으로 설정해야 합니다.
          default: preset
          const: preset
        emotion_preset:
          $ref: '#/components/schemas/EmotionEnum'
          description: |
            생성된 음성에 적용할 감정 프리셋.

            지원되는 감정: normal, happy, sad, angry, whisper, toneup, tonedown

            /v2/voices API를 통해 각 캐릭터에 사용 가능한 감정을 확인하세요.
          default: normal
          example: normal
        emotion_intensity:
          type: number
          maximum: 2
          minimum: 0
          title: Emotion Intensity
          description: |
            생성된 음성의 감정 표현 강도를 제어합니다.

            - 0.0: 완전히 중립적, 감정 색채 없음
            - 0.5: 미묘한 감정 힌트
            - 1.0: 표준 감정 표현(기본값)
            - 1.5: 강한 감정 강조
            - 2.0: 최대 강도, 매우 표현력 있음
          default: 1
          example: 1
      title: 프리셋 프롬프트 (ssfm-v30)
      description: 생성된 음성의 감정 및 스타일 설정.
      additionalProperties: false
    Prompt:
      properties:
        emotion_preset:
          description: |
            적용할 감정 프리셋.

            ssfm-v21 지원 감정: normal, happy, sad, angry

            /v2/voices API를 통해 각 캐릭터에 사용 가능한 감정을 확인하세요.
          example: normal
        emotion_intensity:
          description: |
            감정 표현 강도 제어(0.0~2.0).

            - 0.0: 완전히 중립적
            - 1.0: 표준 표현(기본값)
            - 2.0: 최대 강도
          example: 1
      title: 프롬프트 (ssfm-v21)
      description: 생성된 음성의 감정 및 스타일 설정.
    OutputStream:
      type: object
      properties:
        target_lufs:
          type: integer
          title: Target Lufs
          description: >
            스트리밍 출력 음성의 목표 절대 음량(LUFS) 설정. 원본 음성의 크기와 상관없이 일정한 라우드니스로 정규화합니다.
            `volume` 파라미터와 함께 사용할 수 없습니다.


            권장값: -14(일반적인 스트리밍 표준), -23(방송 표준).
          example: -14
          anyOf:
            - type: number
              maximum: 0
              minimum: -70
            - type: 'null'
        audio_pitch:
          type: integer
          maximum: 12
          minimum: -12
          title: Audio Pitch
          description: >-
            성별과 나이에 영향을 주는 반음 단위의 피치 조정: -12(한 옥타브 낮게, 더 깊은 목소리), -6(반 옥타브 낮게),
            0(원래 피치, 기본값), +6(반 옥타브 높게), +12(한 옥타브 높게, 더 높은 목소리)
          default: 0
          example: 0
        audio_tempo:
          type: number
          maximum: 2
          minimum: 0.5
          title: Audio Tempo
          description: >-
            음성 속도 제어: 0.5(절반 속도, 매우 느리고 명확함), 0.75(보통보다 약간 느림), 1.0(보통 말하기 속도,
            기본값), 1.5(보통보다 50% 빠름), 2.0(두 배 속도, 매우 빠른 음성)
          default: 1
          example: 1
        audio_format:
          type: string
          enum:
            - wav
            - mp3
          title: Audio Format
          description: >
            스트리밍용 출력 오디오 형식.


            **WAV 형식:**

            - 비압축 PCM 오디오

            - 16비트 깊이, 모노 채널, **32000 Hz** 샘플링 속도

            - 청크 단위 전송: 첫 번째 청크는 WAV 헤더(size = 0xFFFFFFFF)를 포함하고, 이후 청크에는 원시 PCM
            데이터가 이어집니다

            - 도착하는 즉시 오디오를 재생하고 싶을 때 권장


            **MP3 형식:**

            - 압축된 MPEG Layer III 오디오

            - 320 kbps 비트레이트, 44100 Hz 샘플링 속도

            - 청크 단위 전송: 각 청크에는 독립적으로 디코딩 가능한 MPEG 프레임이 포함됩니다

            - 대역폭이 제한된 클라이언트에 권장
          default: wav
          example: wav
      title: OutputStream
      description: >-
        스트리밍용 오디오 출력 설정. `target_lufs`로 LUFS 음량 정규화를 적용할 수 있으며, `volume`은 스트리밍
        모드에서 사용할 수 없습니다.
    EmotionEnum:
      type: string
      enum:
        - normal
        - sad
        - happy
        - angry
        - whisper
        - toneup
        - tonedown
      title: EmotionEnum
      description: |
        음성 합성에 사용 가능한 감정 프리셋. 각 감정은 생성된 음성의 톤, 속도, 표현력에 영향을 줍니다.

        **ssfm-v21 지원 감정 (4종류):**
        - normal: 중립적이고 균형 잡힌 톤
        - happy: 밝고 명랑한 표현
        - sad: 우울하고 차분한 톤
        - angry: 강하고 강렬한 전달

        **ssfm-v30 지원 감정 (7종류):**
        - normal: 중립적이고 균형 잡힌 톤
        - happy: 밝고 명랑한 표현
        - sad: 우울하고 차분한 톤
        - angry: 강하고 강렬한 전달
        - whisper: 부드럽고 조용한 말
        - toneup: 더 높은 톤 강조
        - tonedown: 더 낮은 톤 강조

        /v2/voices API 응답을 통해 각 음성에 사용 가능한 감정을 확인하세요.
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-KEY
      description: 인증을 위한 API 키. 타입캐스트 API 콘솔에서 API 키를 생성할 수 있습니다.

````