빠른 시작 - Typecast Documentation

인증 시작하기

타입캐스트 API를 사용하려면 API 키로 요청을 인증해야 합니다. 다음 단계를 따르세요:

첫 번째 단계

타입캐스트 API 콘솔을 방문하여 새 API 키를 생성하세요

두 번째 단계

API 키를 안전하게 보관하세요 - 환경 변수로 저장하는 것을 권장합니다

첫 번째 요청 실행하기

SDK
Direct API

SDK 설치

pip install --upgrade typecast-python

모든 SDK는 최신 버전이 필요합니다.

Python: 이전 버전이 있다면 pip install --upgrade typecast-python으로 업그레이드하세요
Javascript: 이전 버전이 있다면 npm update @neosapience/typecast-js로 업그레이드하세요
C#: dotnet add package typecast-csharp로 업데이트하세요
Java: pom.xml 또는 build.gradle에서 버전을 업데이트하세요
Kotlin: build.gradle.kts에서 버전을 업데이트하세요
Rust: Cargo.toml에서 버전을 업데이트하세요

가져오기 및 초기화

from typecast import Typecast
from typecast.models import TTSRequest, SmartPrompt

# 클라이언트 초기화
client = Typecast(api_key="YOUR_API_KEY")

# 텍스트를 음성으로 변환
response = client.text_to_speech(TTSRequest(
    text="Everything is going to be okay.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    prompt=SmartPrompt(
        emotion_type="smart",
        previous_text="I just got the best news!",
        next_text="I can't wait to celebrate!"
    )
))

# 오디오 파일 저장
with open('typecast.wav', 'wb') as f:
    f.write(response.audio_data)

API 키를 두 가지 방법으로 설정할 수 있습니다:

애플리케이션 코드에서 직접 구성
셸 환경 변수로 설정

# 현재 세션에 설정
export TYPECAST_API_KEY='YOUR_API_KEY'

import requests
import os

api_key = os.environ.get("TYPECAST_API_KEY", "YOUR_API_KEY")

url = "https://api.typecast.ai/v1/text-to-speech"
headers = {"X-API-KEY": api_key, "Content-Type": "application/json"}
payload = {
    "text": "Everything is going to be okay.",
    "model": "ssfm-v30",
    "voice_id": "tc_672c5f5ce59fac2a48faeaee",
    "prompt": {
        "emotion_type": "smart",
        "previous_text": "I just got the best news!",
        "next_text": "I can't wait to celebrate!"
    }
}

response = requests.post(url, headers=headers, json=payload)

if response.status_code == 200:
    with open('typecast.wav', 'wb') as f:
        f.write(response.content)
    print("Audio file saved as typecast.wav")
else:
    print(f"Error: {response.status_code} - {response.text}")

오디오 출력 설정

요청에 output 객체를 추가하여 오디오 출력을 커스터마이즈할 수 있습니다:

파라미터	타입	범위	기본값	설명
`volume`	integer	0–200	100	상대적 볼륨 스케일링. `target_lufs`와 동시에 사용할 수 없습니다.
`target_lufs`	number	-70–0	—	LUFS 기반 절대 라우드니스 정규화. `volume`과 동시에 사용할 수 없습니다.
`audio_pitch`	integer	-12–12	0	피치 조정 (반음 단위).
`audio_tempo`	number	0.5–2.0	1.0	재생 속도 배율.
`audio_format`	string	wav, mp3	wav	출력 오디오 포맷.

target_lufs는 여러 클립 간 일관된 라우드니스가 필요할 때, volume은 단순한 상대적 볼륨 조절이 필요할 때 사용하세요.

예시: target_lufs를 사용한 출력 설정

{
    "text": "일관된 라우드니스 예시입니다.",
    "model": "ssfm-v30",
    "voice_id": "tc_672c5f5ce59fac2a48faeaee",
    "output": {
        "target_lufs": -14.0,
        "audio_format": "mp3"
    }
}

요청에 사용할 수 있는 Voice ID를 찾아보려면 API 레퍼런스의 캐릭터 목록 조회를 참조하세요.

모든 캐릭터 목록 조회하기

타입캐스트를 효과적으로 사용하려면 Voice ID에 액세스해야 합니다. /v2/voices 엔드포인트는 고유 식별자, 이름, 지원 모델 및 감정이 포함된 사용 가능한 캐릭터의 전체 목록을 제공합니다. 모델, 성별, 연령대 및 사용 사례 등의 선택적 쿼리 파라미터를 사용하여 캐릭터를 필터링할 수 있습니다.

각 캐릭터의 특성, 샘플 오디오 클립 및 권장 사용 사례에 대한 자세한 정보는 캐릭터 페이지에서 전체 캐릭터 카탈로그를 더 자세히 살펴볼 수 있습니다.

SDK
Direct API

from typecast import Typecast
from typecast.models import VoicesV2Filter, TTSModel

# 클라이언트 초기화
client = Typecast(api_key="YOUR_API_KEY")

# 모든 음성 가져오기 (선택적으로 모델, 성별, 나이, 사용 사례로 필터링)
voices = client.voices_v2(VoicesV2Filter(model=TTSModel.SSFM_V30))

print(f"Found {len(voices)} voices:")
for voice in voices:
    for model in voice.models:
        print(f"ID: {voice.voice_id}, Name: {voice.voice_name}, Model: {model.version.value}, Emotions: {', '.join(model.emotions)}")

import requests
import os

api_key = os.environ.get("TYPECAST_API_KEY", "YOUR_API_KEY")

url = "https://api.typecast.ai/v2/voices"
headers = {"X-API-KEY": api_key}
params = {"model": "ssfm-v30"}  # 선택 사항: model, gender, age, use_cases

response = requests.get(url, headers=headers, params=params)

if response.status_code == 200:
    voices = response.json()
    print(f"Found {len(voices)} voices:")
    for voice in voices:
        for model in voice['models']:
            print(f"ID: {voice['voice_id']}, Name: {voice['voice_name']}, Model: {model['version']}, Emotions: {', '.join(model['emotions'])}")
else:
    print(f"Error: {response.status_code} - {response.text}")

응답은 각각 다음을 포함하는 음성 객체의 JSON 배열입니다:

{
  "voice_id": "tc_672c5f5ce59fac2a48faeaee",
  "voice_name": "Dylan",
  "models": [
    {
      "version": "ssfm-v30",
      "emotions": ["normal", "happy", "sad", "angry", "whisper", "toneup", "tonedown"]
    }
  ],
  "gender": "male",
  "age": "young_adult",
  "use_cases": ["Conversational", "TikTok/Reels/Shorts", "Audiobook/Storytelling"]
}

텍스트 음성 변환 요청을 할 때 유효한 Voice ID가 필요합니다. ssfm-v30을 사용하면 모든 7가지 감정 프리셋을 모든 캐릭터에서 사용할 수 있습니다.

실시간 오디오 스트리밍

저지연 애플리케이션의 경우, 스트리밍 엔드포인트를 사용하여 전체 합성을 기다리지 않고 오디오 청크가 도착하는 즉시 재생할 수 있습니다. WAV 스트리밍 형식: 32000 Hz, 16비트, 모노 PCM. 첫 번째 청크에 44바이트 WAV 헤더가 포함되며, 이후 청크는 원시 PCM 데이터만 포함합니다.

SDK
Direct API

# pip install typecast-python sounddevice
import sounddevice as sd
from typecast import Typecast
from typecast.models import TTSRequestStream, OutputStream

client = Typecast(api_key="YOUR_API_KEY")

request = TTSRequestStream(
    text="이 텍스트를 실시간으로 오디오로 스트리밍합니다.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    output=OutputStream(audio_format="wav")
)

with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16") as player:
    buf, first = bytearray(), True
    for chunk in client.text_to_speech_stream(request):
        if first:
            chunk = chunk[44:]  # 44바이트 WAV 헤더 건너뛰기
            first = False
        buf.extend(chunk)
        n = len(buf) - (len(buf) % 2)  # int16 정렬
        if n:
            player.write(bytes(buf[:n]))
            del buf[:n]

더 많은 언어(Go, Rust, Swift, C#, Kotlin, C)의 실시간 재생 예시는 각 SDK 문서를 참조하세요.

# pip install requests sounddevice
import requests
import sounddevice as sd
import os

api_key = os.environ.get("TYPECAST_API_KEY", "YOUR_API_KEY")

response = requests.post(
    "https://api.typecast.ai/v1/text-to-speech/stream",
    headers={"X-API-KEY": api_key, "Content-Type": "application/json"},
    json={
        "text": "이 텍스트를 실시간으로 오디오로 스트리밍합니다.",
        "model": "ssfm-v30",
        "voice_id": "tc_672c5f5ce59fac2a48faeaee",
    },
    stream=True
)
response.raise_for_status()

with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16") as player:
    buf, first = bytearray(), True
    for chunk in response.iter_content(chunk_size=4096):
        if not chunk:
            continue
        if first:
            chunk = chunk[44:]  # WAV 헤더 건너뛰기
            first = False
        buf.extend(chunk)
        n = len(buf) - (len(buf) % 2)
        if n:
            player.write(bytes(buf[:n]))
            del buf[:n]

파라미터	타입	범위	기본값	설명
`audio_pitch`	integer	-12–12	0	세미톤 단위의 피치 조절
`audio_tempo`	number	0.5–2.0	1.0	속도 배율
`audio_format`	string	wav, mp3	wav	출력 오디오 형식

스트리밍 모드에서는 volume 및 target_lufs가 지원되지 않습니다.

타임스탬프 TTS로 자막 생성하기

POST /v1/text-to-speech/with-timestamps를 사용하면 오디오와 함께 단어 단위 정렬 데이터를 받아 자막, 가라오케, 립싱크 애플리케이션을 만들 수 있습니다.

from typecast import Typecast
from typecast.models import TTSRequestWithTimestamps

client = Typecast(api_key="YOUR_API_KEY")
response = client.text_to_speech_with_timestamps(TTSRequestWithTimestamps(
    text="Hello. How are you?",
    model="ssfm-v30",
    voice_id="tc_60e5426de8b95f1d3000d7b5",
))

# 단어별 타임스탬프 출력
for word in response.words:
    print(f"[{word.start_time:.3f}s – {word.end_time:.3f}s] {word.text}")

# SRT 자막 내보내기
srt = response.to_srt()
with open("output.srt", "w") as f:
    f.write(srt)

각 언어별 타임스탬프 TTS 사용법은 SDK 문서를 참조하세요. 일본어(jpn), 중국어(zho)는 granularity: "char" (문자 단위) 를 사용해야 합니다.

다음 단계

축하합니다! 첫 번째 AI 음성을 만들었습니다. 더 자세히 알아보려면 다음 리소스를 참조하세요:

API 레퍼런스

타입캐스트 API 사용 방법 알아보기

모델

ssfm-v30 및 ssfm-v21 모델에 대해 알아보기

변경 로그

최신 API 변경 사항 및 업데이트 확인

Documentation Index

​인증 시작하기

​첫 번째 요청 실행하기

​오디오 출력 설정

​모든 캐릭터 목록 조회하기

​실시간 오디오 스트리밍

​타임스탬프 TTS로 자막 생성하기

​다음 단계

API 레퍼런스

모델

변경 로그

인증 시작하기

첫 번째 요청 실행하기

오디오 출력 설정

모든 캐릭터 목록 조회하기

실시간 오디오 스트리밍

타임스탬프 TTS로 자막 생성하기

다음 단계