> ## Documentation Index
> Fetch the complete documentation index at: https://typecast.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Text To Speech

> Generate speech from text using the specified voice model. Supports emotion, volume, pitch, and tempo customization.

First, list all available voice models using the GET /v2/voices endpoint,  then use the voice_id from the response to generate speech with this endpoint. Each voice model has its own unique characteristics. See [Listing all voices](/docs/api-reference/voices/list-voices) for available voices.



## OpenAPI

````yaml /api-reference/openapi.json post /v1/text-to-speech
openapi: 3.1.0
info:
  title: Typecast API
  version: 0.1.2
  x-logo:
    url: https://typecast.ai/_ipx/_/image/logo/tc_logo.webp
servers:
  - url: https://api.typecast.ai
    description: Production server
security:
  - ApiKeyAuth: []
paths:
  /v1/text-to-speech:
    post:
      tags:
        - Text-to-Speech
      summary: Text To Speech
      description: >-
        Generate speech from text using the specified voice model. Supports
        emotion, volume, pitch, and tempo customization.


        First, list all available voice models using the GET /v2/voices
        endpoint,  then use the voice_id from the response to generate speech
        with this endpoint. Each voice model has its own unique characteristics.
        See [Listing all voices](/docs/api-reference/voices/list-voices) for
        available voices.
      operationId: text_to_speech_v1_text_to_speech_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TTSRequest'
        required: true
      responses:
        '200':
          description: Success - Returns audio file
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: >-
                  WAV audio file binary data. Uncompressed PCM audio with 16-bit
                  depth, mono channel, 44100 Hz sample rate.
              example: '[Binary audio data - WAV file content]'
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: >-
                  MP3 audio file binary data. Compressed MPEG Layer III audio
                  with 320 kbps bitrate, 44100 Hz sample rate.
              example: '[Binary audio data - MP3 file content]'
        '400':
          description: Bad Request - Invalid parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid voice_id
        '401':
          description: Unauthorized - Authentication failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid API key
        '402':
          description: Payment Required - Insufficient credits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Insufficient credit
        '404':
          description: Not Found - Voice model not available
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Voice not found
        '422':
          description: Validation Error - Request validation failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Invalid request format
        '429':
          description: Too Many Requests - Rate limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: Too many requests
        '500':
          description: Internal Server Error - Server processing failed
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                detail: An unexpected error occurred
      x-codeSamples:
        - lang: cURL
          label: cURL (save to file)
          source: |
            curl --request POST \
              --url https://api.typecast.ai/v1/text-to-speech \
              --header 'Content-Type: application/json' \
              --header 'X-API-KEY: <api-key>' \
              --output output.wav \
              --data @- <<EOF
            {
              "voice_id": "tc_60e5426de8b95f1d3000d7b5",
              "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
              "model": "ssfm-v30",
              "language": "eng",
              "prompt": {
                "emotion_type": "smart",
                "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
              },
              "output": {
                "volume": 100,
                "audio_pitch": 0,
                "audio_tempo": 1,
                "audio_format": "wav"
              },
              "seed": 42
            }
            EOF
        - lang: Python
          label: Python (requests)
          source: >
            import requests


            API_HOST = "https://api.typecast.ai"

            headers = {
                "X-API-KEY": "<api-key>",
                "Content-Type": "application/json",
            }

            payload = {
                "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                "model": "ssfm-v30",
                "language": "eng",
                "prompt": {
                    "emotion_type": "smart",
                    "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                    "next_text": "I am literally bursting with happiness and I never want this feeling to end!",
                },
                "output": {
                    "volume": 100,
                    "audio_pitch": 0,
                    "audio_tempo": 1,
                    "audio_format": "wav",
                },
                "seed": 42,
            }


            response = requests.post(f"{API_HOST}/v1/text-to-speech",
            headers=headers, json=payload, timeout=60)

            response.raise_for_status()


            with open("output.wav", "wb") as f:
                f.write(response.content)
            print(f"Saved {len(response.content)} bytes to output.wav")
        - lang: C#
          label: C# (HttpClient)
          source: >
            using System;

            using System.Net.Http;

            using System.Text;

            using System.Threading.Tasks;


            var client = new HttpClient();

            client.DefaultRequestHeaders.Add("X-API-KEY", "<api-key>");


            var requestBody = @"{
              ""voice_id"": ""tc_60e5426de8b95f1d3000d7b5"",
              ""text"": ""Everything is so incredibly perfect that I feel like I'm dreaming."",
              ""model"": ""ssfm-v30"",
              ""language"": ""eng"",
              ""prompt"": {
                ""emotion_type"": ""smart"",
                ""previous_text"": ""I feel like I'm walking on air and I just want to scream with joy!"",
                ""next_text"": ""I am literally bursting with happiness and I never want this feeling to end!""
              },
              ""output"": {
                ""volume"": 100,
                ""audio_pitch"": 0,
                ""audio_tempo"": 1,
                ""audio_format"": ""wav""
              },
              ""seed"": 42
            }";


            var content = new StringContent(requestBody, Encoding.UTF8,
            "application/json");

            var response = await
            client.PostAsync("https://api.typecast.ai/v1/text-to-speech",
            content);


            if (response.IsSuccessStatusCode)

            {
                var audioBytes = await response.Content.ReadAsByteArrayAsync();
                await File.WriteAllBytesAsync("output.wav", audioBytes);
                Console.WriteLine("Audio saved to output.wav");
            }
        - lang: Kotlin
          label: Kotlin (OkHttp)
          source: |
            import okhttp3.MediaType.Companion.toMediaType
            import okhttp3.OkHttpClient
            import okhttp3.Request
            import okhttp3.RequestBody.Companion.toRequestBody
            import java.io.File

            val client = OkHttpClient()
            val mediaType = "application/json".toMediaType()

            val requestBody = """
            {
              "voice_id": "tc_60e5426de8b95f1d3000d7b5",
              "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
              "model": "ssfm-v30",
              "language": "eng",
              "prompt": {
                "emotion_type": "smart",
                "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
              },
              "output": {
                "volume": 100,
                "audio_pitch": 0,
                "audio_tempo": 1,
                "audio_format": "wav"
              },
              "seed": 42
            }
            """.trimIndent()

            val request = Request.Builder()
                .url("https://api.typecast.ai/v1/text-to-speech")
                .addHeader("X-API-KEY", "<api-key>")
                .addHeader("Content-Type", "application/json")
                .post(requestBody.toRequestBody(mediaType))
                .build()

            client.newCall(request).execute().use { response ->
                if (response.isSuccessful) {
                    response.body?.bytes()?.let {
                        File("output.wav").writeBytes(it)
                        println("Audio saved to output.wav")
                    }
                }
            }
        - lang: C++
          label: C++ (libcurl)
          source: >
            #include <curl/curl.h>

            #include <fstream>

            #include <string>


            size_t WriteCallback(void* contents, size_t size, size_t nmemb,
            void* userp) {
                ((std::string*)userp)->append((char*)contents, size * nmemb);
                return size * nmemb;
            }


            int main() {
                CURL* curl = curl_easy_init();
                if(curl) {
                    std::string readBuffer;
                    struct curl_slist* headers = NULL;

                    headers = curl_slist_append(headers, "Content-Type: application/json");
                    headers = curl_slist_append(headers, "X-API-KEY: <api-key>");

                    std::string jsonData = R"({
                      "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                      "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                      "model": "ssfm-v30",
                      "language": "eng",
                      "prompt": {
                        "emotion_type": "smart",
                        "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                        "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
                      },
                      "output": {
                        "volume": 100,
                        "audio_pitch": 0,
                        "audio_tempo": 1,
                        "audio_format": "wav"
                      },
                      "seed": 42
                    })";

                    curl_easy_setopt(curl, CURLOPT_URL, "https://api.typecast.ai/v1/text-to-speech");
                    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
                    curl_easy_setopt(curl, CURLOPT_POSTFIELDS, jsonData.c_str());
                    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
                    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);

                    CURLcode res = curl_easy_perform(curl);
                    if(res == CURLE_OK) {
                        std::ofstream outFile("output.wav", std::ios::binary);
                        outFile.write(readBuffer.c_str(), readBuffer.size());
                        outFile.close();
                    }

                    curl_slist_free_all(headers);
                    curl_easy_cleanup(curl);
                }
                return 0;
            }
        - lang: C
          label: C (libcurl)
          source: >
            #include <stdio.h>

            #include <stdlib.h>

            #include <string.h>

            #include <curl/curl.h>


            typedef struct {
                char* data;
                size_t size;
            } MemoryStruct;


            size_t WriteMemoryCallback(void* contents, size_t size, size_t
            nmemb, void* userp) {
                size_t realsize = size * nmemb;
                MemoryStruct* mem = (MemoryStruct*)userp;

                char* ptr = realloc(mem->data, mem->size + realsize + 1);
                if(!ptr) return 0;

                mem->data = ptr;
                memcpy(&(mem->data[mem->size]), contents, realsize);
                mem->size += realsize;
                mem->data[mem->size] = 0;

                return realsize;
            }


            int main(void) {
                CURL* curl;
                CURLcode res;
                MemoryStruct chunk = {NULL, 0};

                curl_global_init(CURL_GLOBAL_ALL);
                curl = curl_easy_init();

                if(curl) {
                    struct curl_slist* headers = NULL;
                    headers = curl_slist_append(headers, "Content-Type: application/json");
                    headers = curl_slist_append(headers, "X-API-KEY: <api-key>");

                    const char* jsonData = "{"
                        "\"voice_id\":\"tc_60e5426de8b95f1d3000d7b5\","
                        "\"text\":\"Everything is so incredibly perfect that I feel like I'm dreaming.\","
                        "\"model\":\"ssfm-v30\","
                        "\"language\":\"eng\","
                        "\"output\":{\"volume\":100,\"audio_pitch\":0,\"audio_tempo\":1,\"audio_format\":\"wav\"},"
                        "\"seed\":42"
                        "}";

                    curl_easy_setopt(curl, CURLOPT_URL, "https://api.typecast.ai/v1/text-to-speech");
                    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
                    curl_easy_setopt(curl, CURLOPT_POSTFIELDS, jsonData);
                    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
                    curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void*)&chunk);

                    res = curl_easy_perform(curl);

                    if(res == CURLE_OK) {
                        FILE* fp = fopen("output.wav", "wb");
                        fwrite(chunk.data, 1, chunk.size, fp);
                        fclose(fp);
                    }

                    curl_slist_free_all(headers);
                    curl_easy_cleanup(curl);
                    free(chunk.data);
                }

                curl_global_cleanup();
                return 0;
            }
        - lang: Swift
          label: Swift (URLSession)
          source: >
            import Foundation


            let url = URL(string: "https://api.typecast.ai/v1/text-to-speech")!

            var request = URLRequest(url: url)

            request.httpMethod = "POST"

            request.setValue("application/json", forHTTPHeaderField:
            "Content-Type")

            request.setValue("<api-key>", forHTTPHeaderField: "X-API-KEY")


            let requestBody: [String: Any] = [
                "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                "model": "ssfm-v30",
                "language": "eng",
                "prompt": [
                    "emotion_type": "smart",
                    "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                    "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
                ],
                "output": [
                    "volume": 100,
                    "audio_pitch": 0,
                    "audio_tempo": 1,
                    "audio_format": "wav"
                ],
                "seed": 42
            ]


            request.httpBody = try? JSONSerialization.data(withJSONObject:
            requestBody)


            let task = URLSession.shared.dataTask(with: request) { data,
            response, error in
                if let data = data {
                    try? data.write(to: URL(fileURLWithPath: "output.wav"))
                    print("Audio saved to output.wav")
                }
            }

            task.resume()
        - lang: Rust
          label: Rust (reqwest)
          source: |
            use reqwest;
            use serde_json::json;
            use std::fs::File;
            use std::io::Write;

            #[tokio::main]
            async fn main() -> Result<(), Box<dyn std::error::Error>> {
                let client = reqwest::Client::new();

                let request_body = json!({
                    "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                    "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                    "model": "ssfm-v30",
                    "language": "eng",
                    "prompt": {
                        "emotion_type": "smart",
                        "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                        "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
                    },
                    "output": {
                        "volume": 100,
                        "audio_pitch": 0,
                        "audio_tempo": 1,
                        "audio_format": "wav"
                    },
                    "seed": 42
                });

                let response = client
                    .post("https://api.typecast.ai/v1/text-to-speech")
                    .header("X-API-KEY", "<api-key>")
                    .header("Content-Type", "application/json")
                    .json(&request_body)
                    .send()
                    .await?;

                if response.status().is_success() {
                    let bytes = response.bytes().await?;
                    let mut file = File::create("output.wav")?;
                    file.write_all(&bytes)?;
                    println!("Audio saved to output.wav");
                }

                Ok(())
            }
        - lang: JavaScript
          label: JavaScript (Node.js)
          source: >
            // Node 18+ (built-in fetch).

            import { writeFile } from "node:fs/promises";


            const response = await
            fetch("https://api.typecast.ai/v1/text-to-speech", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                    "X-API-KEY": "<api-key>",
                },
                body: JSON.stringify({
                    voice_id: "tc_60e5426de8b95f1d3000d7b5",
                    text: "Everything is so incredibly perfect that I feel like I'm dreaming.",
                    model: "ssfm-v30",
                    language: "eng",
                    prompt: {
                        emotion_type: "smart",
                        previous_text: "I feel like I'm walking on air and I just want to scream with joy!",
                        next_text: "I am literally bursting with happiness and I never want this feeling to end!",
                    },
                    output: { volume: 100, audio_pitch: 0, audio_tempo: 1, audio_format: "wav" },
                    seed: 42,
                }),
            });

            if (!response.ok) throw new Error(`HTTP ${response.status}`);


            const buffer = Buffer.from(await response.arrayBuffer());

            await writeFile("output.wav", buffer);

            console.log(`Saved ${buffer.length} bytes to output.wav`);
        - lang: PHP
          label: PHP (curl)
          source: |
            <?php
            $payload = json_encode([
                "voice_id" => "tc_60e5426de8b95f1d3000d7b5",
                "text" => "Everything is so incredibly perfect that I feel like I'm dreaming.",
                "model" => "ssfm-v30",
                "language" => "eng",
                "prompt" => [
                    "emotion_type" => "smart",
                    "previous_text" => "I feel like I'm walking on air and I just want to scream with joy!",
                    "next_text" => "I am literally bursting with happiness and I never want this feeling to end!",
                ],
                "output" => ["volume" => 100, "audio_pitch" => 0, "audio_tempo" => 1, "audio_format" => "wav"],
                "seed" => 42,
            ]);

            $ch = curl_init("https://api.typecast.ai/v1/text-to-speech");
            curl_setopt_array($ch, [
                CURLOPT_POST => true,
                CURLOPT_RETURNTRANSFER => true,
                CURLOPT_HTTPHEADER => [
                    "Content-Type: application/json",
                    "X-API-KEY: <api-key>",
                ],
                CURLOPT_POSTFIELDS => $payload,
            ]);
            $audio = curl_exec($ch);
            $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
            if ($status !== 200) {
                fwrite(STDERR, "HTTP $status\n");
                exit(1);
            }
            file_put_contents("output.wav", $audio);
            echo "Saved " . strlen($audio) . " bytes to output.wav\n";
        - lang: Go
          label: Go (net/http)
          source: |
            package main

            import (
                "bytes"
                "fmt"
                "io"
                "net/http"
                "os"
            )

            func main() {
                body := []byte(`{
                    "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                    "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                    "model": "ssfm-v30",
                    "language": "eng",
                    "prompt": {
                        "emotion_type": "smart",
                        "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                        "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
                    },
                    "output": {"volume": 100, "audio_pitch": 0, "audio_tempo": 1, "audio_format": "wav"},
                    "seed": 42
                }`)

                req, _ := http.NewRequest("POST", "https://api.typecast.ai/v1/text-to-speech", bytes.NewReader(body))
                req.Header.Set("Content-Type", "application/json")
                req.Header.Set("X-API-KEY", "<api-key>")

                resp, err := http.DefaultClient.Do(req)
                if err != nil {
                    panic(err)
                }
                defer resp.Body.Close()

                out, _ := os.Create("output.wav")
                defer out.Close()
                n, _ := io.Copy(out, resp.Body)
                fmt.Printf("Saved %d bytes to output.wav\n", n)
            }
        - lang: Java
          label: Java (HttpClient)
          source: |
            // Java 11+ HttpClient with file body handler.
            import java.net.URI;
            import java.net.http.HttpClient;
            import java.net.http.HttpRequest;
            import java.net.http.HttpResponse;
            import java.nio.file.Path;

            public class TextToSpeech {
                public static void main(String[] args) throws Exception {
                    String body = """
                        {
                          "voice_id": "tc_60e5426de8b95f1d3000d7b5",
                          "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
                          "model": "ssfm-v30",
                          "language": "eng",
                          "prompt": {
                            "emotion_type": "smart",
                            "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
                            "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
                          },
                          "output": {"volume": 100, "audio_pitch": 0, "audio_tempo": 1, "audio_format": "wav"},
                          "seed": 42
                        }
                        """;

                    HttpRequest request = HttpRequest.newBuilder()
                            .uri(URI.create("https://api.typecast.ai/v1/text-to-speech"))
                            .header("Content-Type", "application/json")
                            .header("X-API-KEY", "<api-key>")
                            .POST(HttpRequest.BodyPublishers.ofString(body))
                            .build();

                    HttpResponse<Path> response = HttpClient.newHttpClient()
                            .send(request, HttpResponse.BodyHandlers.ofFile(Path.of("output.wav")));

                    System.out.println("Audio saved to " + response.body());
                }
            }
        - lang: Ruby
          label: Ruby (net/http)
          source: |
            require "net/http"
            require "uri"
            require "json"

            uri = URI("https://api.typecast.ai/v1/text-to-speech")
            http = Net::HTTP.new(uri.host, uri.port)
            http.use_ssl = true

            req = Net::HTTP::Post.new(uri)
            req["Content-Type"] = "application/json"
            req["X-API-KEY"] = "<api-key>"
            req.body = {
              voice_id: "tc_60e5426de8b95f1d3000d7b5",
              text: "Everything is so incredibly perfect that I feel like I'm dreaming.",
              model: "ssfm-v30",
              language: "eng",
              prompt: {
                emotion_type: "smart",
                previous_text: "I feel like I'm walking on air and I just want to scream with joy!",
                next_text: "I am literally bursting with happiness and I never want this feeling to end!",
              },
              output: { volume: 100, audio_pitch: 0, audio_tempo: 1, audio_format: "wav" },
              seed: 42,
            }.to_json

            resp = http.request(req)
            raise "HTTP #{resp.code}" unless resp.code == "200"

            File.binwrite("output.wav", resp.body)
            puts "Saved #{resp.body.bytesize} bytes to output.wav"
components:
  schemas:
    TTSRequest:
      type: object
      properties:
        voice_id:
          type: string
          title: Voice Id
          description: >-
            Voice identifier. Two prefixes are supported:


            - `tc_` — Built-in Typecast voices (e.g.,
            `tc_60e5426de8b95f1d3000d7b5`). See [Listing all
            voices](/docs/api-reference/voices/list-voices) for available IDs.

            - `uc_` — Custom voices created via [Instant
            cloning](/docs/api-reference/voices/instant-cloning) (e.g.,
            `uc_64a1b2c3d4e5f6a7b8c9d0e1`). Only the owner of a cloned voice can
            use it.


            Case-sensitive: must use lowercase prefix.
          example: tc_60e5426de8b95f1d3000d7b5
        text:
          type: string
          title: Text
          description: >-
            Text to convert to speech. Minimum 1 character, maximum 2000
            characters. Credits consumed based on text length. Supports multiple
            languages including English, Korean, Japanese, and Chinese. Special
            characters and punctuation are handled automatically.
          example: Everything is so incredibly perfect that I feel like I'm dreaming.
          minLength: 1
          maxLength: 2000
        model:
          $ref: '#/components/schemas/TTSModel'
          description: >
            Voice model to use for speech synthesis.


            - **ssfm-v30**: Latest model with improved prosody and additional
            emotion presets (recommended)

            - **ssfm-v21**: Stable production model with reliable quality
          example: ssfm-v30
        language:
          type: string
          title: Language
          description: >
            Language code following ISO 639-3 standard. Case-insensitive (both
            "ENG" and "eng" are accepted). If not provided, will be
            auto-detected based on text content.


            <details>

            <summary><strong>ssfm-v30 Supported Languages
            (37)</strong></summary>


            | Code | Language | Code | Language | Code | Language |

            |------|----------|------|----------|------|----------|

            | ARA | Arabic | IND | Indonesian | POR | Portuguese |

            | BEN | Bengali | ITA | Italian | RON | Romanian |

            | BUL | Bulgarian | JPN | Japanese | RUS | Russian |

            | CES | Czech | KOR | Korean | SLK | Slovak |

            | DAN | Danish | MSA | Malay | SPA | Spanish |

            | DEU | German | NAN | Min Nan | SWE | Swedish |

            | ELL | Greek | NLD | Dutch | TAM | Tamil |

            | ENG | English | NOR | Norwegian | TGL | Tagalog |

            | FIN | Finnish | PAN | Punjabi | THA | Thai |

            | FRA | French | POL | Polish | TUR | Turkish |

            | HIN | Hindi | UKR | Ukrainian | VIE | Vietnamese |

            | HRV | Croatian | YUE | Cantonese | ZHO | Chinese |

            | HUN | Hungarian | | | | |


            </details>


            <details>

            <summary><strong>ssfm-v21 Supported Languages
            (27)</strong></summary>


            | Code | Language | Code | Language | Code | Language |

            |------|----------|------|----------|------|----------|

            | ARA | Arabic | IND | Indonesian | RON | Romanian |

            | BUL | Bulgarian | ITA | Italian | RUS | Russian |

            | CES | Czech | JPN | Japanese | SLK | Slovak |

            | DAN | Danish | KOR | Korean | SPA | Spanish |

            | DEU | German | MSA | Malay | SWE | Swedish |

            | ELL | Greek | NLD | Dutch | TAM | Tamil |

            | ENG | English | POL | Polish | TGL | Tagalog |

            | FIN | Finnish | POR | Portuguese | UKR | Ukrainian |

            | FRA | French | HRV | Croatian | ZHO | Chinese |


            </details>
          example: eng
        prompt:
          title: Prompt
          description: >-
            Emotion and style settings for the generated speech, including
            emotion type (happy/sad/angry/normal) and intensity (0.0 to 2.0) to
            control the emotional expression
          oneOf:
            - $ref: '#/components/schemas/SmartPrompt'
            - $ref: '#/components/schemas/PresetPrompt'
            - $ref: '#/components/schemas/Prompt'
          discriminator:
            propertyName: emotion_type
            mapping:
              preset:
                $ref: '#/components/schemas/PresetPrompt'
              smart:
                $ref: '#/components/schemas/SmartPrompt'
        output:
          $ref: '#/components/schemas/Output'
          description: >-
            Audio output settings including volume (0-200), pitch (-12 to +12
            semitones), tempo (0.5x to 2.0x), and format (wav/mp3) for
            controlling the final audio characteristics
        seed:
          type: integer
          minimum: 0
          title: Seed
          description: >-
            Unsigned integer seed for reproducible speech generation. The same
            seed with the same input parameters will produce identical audio
            output.


            - Must be a non-negative integer (≥ 0). Negative values are not
            accepted.

            - If omitted, the server generates a random seed each time,
            producing slight variations.
          example: 42
          anyOf:
            - type: integer
              maximum: 4294967295
              minimum: 0
            - type: 'null'
          format: uint32
      required:
        - voice_id
        - text
        - model
      title: TTSRequest
      description: Text-to-speech request parameters
    ErrorResponse:
      type: object
      properties:
        detail:
          type: string
          description: Error message describing the issue
      required:
        - detail
      example:
        detail: An error occurred processing the request
    TTSModel:
      type: string
      enum:
        - ssfm-v30
        - ssfm-v21
      title: TTSModel
      description: >
        TTS model version to use for speech synthesis. Different models offer
        varying capabilities and quality levels.


        Available models:

        - **ssfm-v30**: Latest model with improved prosody and additional
        emotion presets (recommended)

        - **ssfm-v21**: Stable production model with proven reliability and
        consistent quality
    SmartPrompt:
      type: object
      properties:
        emotion_type:
          type: string
          title: Emotion Type
          description: >
            Discriminator field to identify the prompt type. Must be set to
            "smart" for context-aware emotion inference.
          default: smart
          const: smart
        previous_text:
          type: string
          title: Previous Text
          description: >
            Text that comes BEFORE the `text` field in TTSRequest. Provides
            backward context for emotion inference.


            The model analyzes the flow: `previous_text` → `text` (synthesized)
            → `next_text`


            - Maximum 2000 characters

            - Helps the model understand emotional build-up and context

            - Leave empty if no preceding context is available
          default: ''
          example: I feel like I'm walking on air and I just want to scream with joy!
        next_text:
          type: string
          title: Next Text
          description: >
            Text that comes AFTER the `text` field in TTSRequest. Provides
            forward context for emotion inference.


            The model analyzes the flow: `previous_text` → `text` (synthesized)
            → `next_text`


            - Maximum 2000 characters

            - Helps the model anticipate emotional transitions

            - Leave empty if no following context is available
          default: ''
          example: >-
            I am literally bursting with happiness and I never want this feeling
            to end!
      title: SmartPrompt (ssfm-v30)
      description: Emotion and style settings for the generated speech.
      example:
        emotion_type: smart
        previous_text: I feel like I'm walking on air and I just want to scream with joy!
        next_text: >-
          I am literally bursting with happiness and I never want this feeling
          to end!
      additionalProperties: false
    PresetPrompt:
      type: object
      properties:
        emotion_type:
          type: string
          title: Emotion Type
          description: >
            Discriminator field to identify the prompt type. Must be set to
            "preset" for preset-based emotion control.
          default: preset
          const: preset
        emotion_preset:
          $ref: '#/components/schemas/EmotionEnum'
          description: >
            Emotion preset to apply to the generated speech.


            Supported emotions: normal, happy, sad, angry, whisper, toneup,
            tonedown


            Check available emotions for each voice through the /v2/voices API.
          default: normal
          example: normal
        emotion_intensity:
          type: number
          maximum: 2
          minimum: 0
          title: Emotion Intensity
          description: >
            Controls the strength of emotional expression in the generated
            speech.


            - 0.0: Completely neutral, no emotional coloring

            - 0.5: Subtle emotional hints

            - 1.0: Standard emotional expression (default)

            - 1.5: Strong emotional emphasis

            - 2.0: Maximum intensity, highly expressive
          default: 1
          example: 1
      title: PresetPrompt (ssfm-v30)
      description: Emotion and style settings for the generated speech.
      additionalProperties: false
    Prompt:
      properties:
        emotion_preset:
          description: |
            Emotion preset to apply.

            Supported emotions for ssfm-v21: normal, happy, sad, angry

            Check available emotions for each voice through the /v2/voices API.
          example: normal
        emotion_intensity:
          description: |
            Controls the strength of emotional expression (0.0 to 2.0).

            - 0.0: Completely neutral
            - 1.0: Standard expression (default)
            - 2.0: Maximum intensity
          example: 1
      title: Prompt (ssfm-v21)
      description: Emotion and style settings for the generated speech.
    Output:
      type: object
      properties:
        target_lufs:
          type: number
          title: Target Lufs
          description: >
            Sets the target absolute loudness (LUFS) for the output audio. This
            normalizes all generated voices to a consistent volume level,
            regardless of the original source's loudness. Values closer to 0 are
            louder, while values closer to -70 are quieter.


            - Required range: -70 <= x <= 0

            - Recommended values: -14 (common streaming standard), -23
            (broadcast standard)

            - **Note:** This parameter cannot be used simultaneously with the
            `volume` parameter. Use `target_lufs` for consistent absolute
            loudness across different clips, or use `volume` for traditional
            relative scaling.
          example: -14
          anyOf:
            - type: number
              maximum: 0
              minimum: -70
            - type: 'null'
        volume:
          title: Volume
          description: >
            Adjusts the relative volume of the output audio: 0 (completely
            silent), 50 (half volume), 100 (standard volume, default), 150 (50%
            louder than standard), 200 (maximum volume, twice as loud as
            standard).


            Since this only scales the existing volume, using `volume` can
            amplify the loudness differences between voices if they have
            different baseline levels. For consistent output across all clips,
            use `target_lufs` instead.


            - **Note:** This parameter cannot be used simultaneously with the
            `target_lufs` parameter.


            Required range: 0 <= x <= 200
          example: 100
          anyOf:
            - type: integer
              maximum: 200
              minimum: 0
            - type: 'null'
        audio_pitch:
          type: integer
          maximum: 12
          minimum: -12
          title: Audio Pitch
          description: >-
            Adjusts the pitch in semitones to affect perceived gender and age:
            -12 (one octave lower, deeper voice), -6 (half octave lower), 0
            (original pitch, default), +6 (half octave higher), +12 (one octave
            higher, higher voice)
          default: 0
          example: 0
        audio_tempo:
          type: number
          maximum: 2
          minimum: 0.5
          title: Audio Tempo
          description: >-
            Controls speech speed: 0.5 (half speed, very slow and clear), 0.75
            (slightly slower than normal), 1.0 (normal speaking speed, default),
            1.5 (50% faster than normal), 2.0 (double speed, very fast speech)
          default: 1
          example: 1
        audio_format:
          type: string
          enum:
            - wav
            - mp3
          title: Audio Format
          description: |
            Output audio format.

            **WAV format:**
            - Uncompressed PCM audio
            - 16-bit depth, mono channel, 44100 Hz sample rate
            - Higher quality, larger file size
            - Recommended for professional audio production

            **MP3 format:**
            - Compressed MPEG Layer III audio
            - 320 kbps bitrate, 44100 Hz sample rate
            - Smaller file size
            - Recommended for web streaming and distribution
          default: wav
          example: wav
      title: Output
      description: Audio output settings for controlling the final audio characteristics
    EmotionEnum:
      type: string
      enum:
        - normal
        - sad
        - happy
        - angry
        - whisper
        - toneup
        - tonedown
      title: EmotionEnum
      description: >
        Available emotion presets for speech synthesis. Each emotion affects the
        tone, pace, and expressiveness of the generated speech.


        **ssfm-v21 Supported Emotions (4 types):**

        - normal: Neutral, balanced tone

        - happy: Bright, cheerful expression

        - sad: Melancholic, subdued tone

        - angry: Strong, intense delivery


        **ssfm-v30 Supported Emotions (7 types):**

        - normal: Neutral, balanced tone

        - happy: Bright, cheerful expression

        - sad: Melancholic, subdued tone

        - angry: Strong, intense delivery

        - whisper: Soft, quiet speech

        - toneup: Higher tonal emphasis

        - tonedown: Lower tonal emphasis


        Check available emotions for each voice through the /v2/voices API
        response.
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-KEY
      description: >-
        API key for authentication. You can obtain an API key from the Typecast
        API Console.

````