Skip to main content
POST
/
v1
/
text-to-speech
cURL (save to file)
curl --request POST \
  --url https://api.typecast.ai/v1/text-to-speech \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <api-key>' \
  --output output.wav \
  --data @- <<EOF
{
  "voice_id": "tc_60e5426de8b95f1d3000d7b5",
  "text": "Everything is so incredibly perfect that I feel like I'm dreaming.",
  "model": "ssfm-v30",
  "language": "eng",
  "prompt": {
    "emotion_type": "smart",
    "previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
    "next_text": "I am literally bursting with happiness and I never want this feeling to end!"
  },
  "output": {
    "volume": 100,
    "audio_pitch": 0,
    "audio_tempo": 1,
    "audio_format": "wav"
  },
  "seed": 42
}
EOF
"[Binary audio data - WAV file content]"

Authorizations

X-API-KEY
string
header
required

API key for authentication. You can obtain an API key from the Typecast dashboard.

Body

application/json
voice_id
string
required

Voice ID in format 'tc_' followed by a unique identifier (e.g., 'tc_60e5426de8b95f1d3000d7b5'). Case-sensitive: must use lowercase (tc_xxx). See Listing all voices for available voices.

Example:

"tc_60e5426de8b95f1d3000d7b5"

text
string
required

Text to convert to speech. Minimum 1 character, maximum 2000 characters. Credits consumed based on text length. Supports multiple languages including English, Korean, Japanese, and Chinese. Special characters and punctuation are handled automatically.

Required string length: 1 - 2000
Example:

"Everything is so incredibly perfect that I feel like I'm dreaming."

model
enum<string>
required

Voice model to use for speech synthesis.

  • ssfm-v30: Latest model with improved prosody and additional emotion presets (recommended)
  • ssfm-v21: Stable production model with reliable quality
Available options:
ssfm-v30,
ssfm-v21
Example:

"ssfm-v30"

language
string

Language code following ISO 639-3 standard. Case-insensitive (both "ENG" and "eng" are accepted). If not provided, will be auto-detected based on text content.

ssfm-v30 Supported Languages (37)
CodeLanguageCodeLanguageCodeLanguage
ARAArabicINDIndonesianPORPortuguese
BENBengaliITAItalianRONRomanian
BULBulgarianJPNJapaneseRUSRussian
CESCzechKORKoreanSLKSlovak
DANDanishMSAMalaySPASpanish
DEUGermanNANMin NanSWESwedish
ELLGreekNLDDutchTAMTamil
ENGEnglishNORNorwegianTGLTagalog
FINFinnishPANPunjabiTHAThai
FRAFrenchPOLPolishTURTurkish
HINHindiUKRUkrainianVIEVietnamese
HRVCroatianYUECantoneseZHOChinese
HUNHungarian
ssfm-v21 Supported Languages (27)
CodeLanguageCodeLanguageCodeLanguage
ARAArabicINDIndonesianRONRomanian
BULBulgarianITAItalianRUSRussian
CESCzechJPNJapaneseSLKSlovak
DANDanishKORKoreanSPASpanish
DEUGermanMSAMalaySWESwedish
ELLGreekNLDDutchTAMTamil
ENGEnglishPOLPolishTGLTagalog
FINFinnishPORPortugueseUKRUkrainian
FRAFrenchHRVCroatianZHOChinese
Example:

"eng"

prompt
SmartPrompt (ssfm-v30) · object

Emotion and style settings for the generated speech.

Example:
{
"emotion_type": "smart",
"previous_text": "I feel like I'm walking on air and I just want to scream with joy!",
"next_text": "I am literally bursting with happiness and I never want this feeling to end!"
}
output
Output · object

Audio output settings including volume (0-200), pitch (-12 to +12 semitones), tempo (0.5x to 2.0x), and format (wav/mp3) for controlling the final audio characteristics

seed
integer

Random seed for controlling speech generation variations. Use any integer value to influence the output.

Example:

42

Response

Success - Returns audio file

WAV audio file binary data. Uncompressed PCM audio with 16-bit depth, mono channel, 44100 Hz sample rate.