Python - Typecast Documentation

패키지

타입캐스트 Python SDK

소스 코드

타입캐스트 Python SDK 소스 코드

설치

pip를 사용하여 타입캐스트 Python SDK를 설치하세요:

pip install --upgrade typecast-python

패키지는 typecast-python으로 설치되지만, typecast로 임포트합니다.

버전 0.3.0 이상이 설치되어 있는지 확인하세요. pip show typecast-python으로 버전을 확인할 수 있습니다. 이전 버전이 있다면 pip install --upgrade typecast-python을 실행하여 업데이트하세요.

빠른 시작

텍스트를 음성으로 변환하는 간단한 예제입니다:

from typecast import Typecast
from typecast.models import TTSRequest

# 클라이언트 초기화
client = Typecast(api_key="YOUR_API_KEY")

# 텍스트를 음성으로 변환
response = client.text_to_speech(TTSRequest(
    text="Hello there! I'm your friendly text-to-speech agent.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee"
))

# 오디오 파일 저장
with open('output.wav', 'wb') as f:
    f.write(response.audio_data)

print(f"Duration: {response.duration}s, Format: {response.format}")

기능

타입캐스트 Python SDK는 텍스트 음성 변환을 위한 강력한 기능을 제공합니다:

다중 음성 모델: ssfm-v30(최신) 및 ssfm-v21 AI 음성 모델 지원
다국어 지원: 영어, 한국어, 스페인어, 일본어, 중국어 등 37개 언어 지원
감정 조절: 감정 프리셋(normal, happy, sad, angry, whisper, toneup, tonedown) 또는 스마트 문맥 인식 추론
오디오 사용자 정의: 라우드니스(LUFS -70 to 0), 피치(-12 to +12 반음), 템포(0.5x to 2.0x), 형식(WAV/MP3) 제어
비동기 지원: 고성능 애플리케이션을 위한 내장 비동기 클라이언트
캐릭터 탐색: 모델, 성별, 나이, 사용 사례별 필터링이 가능한 V2 Voices API
타임스탬프 TTS: 자막, 가라오케, 립싱크를 위한 단어·문자 단위 정렬 데이터
스트리밍: 저지연 재생을 위한 실시간 청크 오디오 전송
타입 힌트: Pydantic 모델을 사용한 완전한 타입 주석

설정

환경 변수를 사용하거나 클라이언트에 직접 전달하여 API 키를 구성할 수 있습니다:

export TYPECAST_API_KEY="your-api-key-here"

고급 사용법

감정 제어 (ssfm-v30)

ssfm-v30은 두 가지 감정 제어 모드를 제공합니다: 프리셋 및 스마트.

스마트 모드
프리셋 모드

AI가 문맥에서 감정을 추론하도록 합니다:

from typecast import Typecast
from typecast.models import TTSRequest, SmartPrompt

client = Typecast()

response = client.text_to_speech(TTSRequest(
    text="Everything is going to be okay.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    prompt=SmartPrompt(
        emotion_type="smart",
        previous_text="I just got the best news!",  # 선택적 문맥
        next_text="I can't wait to celebrate!"      # 선택적 문맥
    )
))

프리셋 값으로 감정을 명시적으로 설정합니다:

from typecast import Typecast
from typecast.models import TTSRequest, PresetPrompt

client = Typecast()

response = client.text_to_speech(TTSRequest(
    text="I am so excited to show you these features!",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    prompt=PresetPrompt(
        emotion_type="preset",
        emotion_preset="happy",    # normal, happy, sad, angry, whisper, toneup, tonedown
        emotion_intensity=1.5      # 범위: 0.0 ~ 2.0
    )
))

음성 조절

라우드니스, 피치, 템포 및 출력 형식을 제어합니다:

from typecast import Typecast
from typecast.models import TTSRequest, Output

client = Typecast()

response = client.text_to_speech(TTSRequest(
    text="Customized audio output!",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    output=Output(
        target_lufs=-14.0,   # 범위: -70 ~ 0 (LUFS)
        audio_pitch=2,       # 범위: -12 to +12 반음
        audio_tempo=1.2,     # 범위: 0.5x to 2.0x
        audio_format="mp3"   # 옵션: wav, mp3
    ),
    seed=42                  # 부호 없는 정수 시드 (재현 가능한 결과)
))

캐릭터 탐색 (V2 API)

향상된 메타데이터로 사용 가능한 캐릭터를 나열하고 필터링합니다:

from typecast import Typecast
from typecast.models import VoicesV2Filter, TTSModel, GenderEnum, AgeEnum

client = Typecast()

# 모든 음성 가져오기
voices = client.voices_v2()

# 기준으로 필터링
filtered = client.voices_v2(VoicesV2Filter(
    model=TTSModel.SSFM_V30,
    gender=GenderEnum.FEMALE,
    age=AgeEnum.YOUNG_ADULT
))

# 음성 정보 표시
for voice in voices:
    print(f"ID: {voice.voice_id}, Name: {voice.voice_name}")
    print(f"Gender: {voice.gender}, Age: {voice.age}")
    print(f"Models: {', '.join(m.version.value for m in voice.models)}")
    print(f"Use cases: {voice.use_cases}")

비동기 클라이언트

고성능 애플리케이션의 경우 비동기 클라이언트를 사용하세요:

import asyncio
from typecast import AsyncTypecast
from typecast.models import TTSRequest

async def main():
    async with AsyncTypecast() as client:
        response = await client.text_to_speech(TTSRequest(
            text="Hello from async!",
            model="ssfm-v30",
            voice_id="tc_672c5f5ce59fac2a48faeaee"
        ))

        with open('async_output.wav', 'wb') as f:
            f.write(response.audio_data)

asyncio.run(main())

스트리밍

저지연 재생을 위한 실시간 오디오 청크 스트리밍:

# pip install requests sounddevice
import sounddevice as sd
from typecast import Typecast
from typecast.models import TTSRequestStream, OutputStream

client = Typecast()

request = TTSRequestStream(
    text="이 텍스트를 실시간으로 오디오로 스트리밍합니다.",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    output=OutputStream(audio_format="wav")
)

with sd.RawOutputStream(samplerate=32000, channels=1, dtype="int16") as player:
    buf, first = bytearray(), True
    for chunk in client.text_to_speech_stream(request):
        if first:
            chunk = chunk[44:]  # 44바이트 WAV 헤더 건너뛰기
            first = False
        buf.extend(chunk)
        n = len(buf) - (len(buf) % 2)  # int16 정렬
        if n:
            player.write(bytes(buf[:n]))
            del buf[:n]

WAV 스트리밍 형식: 32000 Hz, 16비트, 모노 PCM. 첫 번째 청크에 44바이트 WAV 헤더(size = 0xFFFFFFFF)가 포함되며, 이후 청크는 원시 PCM 데이터만 포함합니다. MP3 형식: 320 kbps, 44100 Hz, 각 청크는 독립적으로 디코딩 가능합니다. 스트리밍 엔드포인트는 volume 및 target_lufs를 지원하지 않습니다.

타임스탬프 TTS

text_to_speech_with_timestamps()는 POST /v1/text-to-speech/with-timestamps를 래핑하여 오디오와 함께 단어·문자 단위 정렬 데이터를 반환합니다. 자막 생성, 가라오케 하이라이트, 립싱크 등에 활용할 수 있습니다.

기본 사용법

from typecast import Typecast
from typecast.models import TTSRequestWithTimestamps

client = Typecast(api_key="YOUR_API_KEY")

result = client.text_to_speech_with_timestamps(TTSRequestWithTimestamps(
    text="Hello. How are you?",
    model="ssfm-v30",
    voice_id="tc_60e5426de8b95f1d3000d7b5",
))

# 오디오 저장
with open("output.wav", "wb") as f:
    f.write(result.audio_bytes())

print(f"Duration: {result.audio_duration}s")
for word in result.words:
    print(f"  [{word.start_time:.3f}s – {word.end_time:.3f}s] {word.text}")

Granularity(정렬 단위)

granularity="word"(기본값) 또는 granularity="char"를 지정하여 정렬 단위를 설정합니다.

# 문자 단위 정렬 — 일본어/중국어에 필수
result = client.text_to_speech_with_timestamps(TTSRequestWithTimestamps(
    text="Hello. How are you?",
    model="ssfm-v30",
    voice_id="tc_60e5426de8b95f1d3000d7b5",
    granularity="char",
))

자막 내보내기

SRT 및 WebVTT 형식의 자막을 출력합니다. 자막은 문장 종결 부호(. ? ! 。？！)를 기준으로 분할되며 큐당 7초/42자 상한을 적용합니다(BBC/Netflix 자막 가이드라인).

# SRT 자막 내보내기
with open("output.srt", "w", encoding="utf-8") as f:
    f.write(result.to_srt())

# WebVTT 자막 내보내기
with open("output.vtt", "w", encoding="utf-8") as f:
    f.write(result.to_vtt())

일본어/중국어: 공백이 없는 언어(jpn, zho)는 단어 단위 세그먼트가 문장 전체로 나옵니다. 이러한 언어에서는 granularity="char"를 사용하세요.

지원 언어

권장: 타입 안전한 언어 선택을 위해 LanguageCode enum을 사용하세요. ISO 639-3 코드를 문자열로 전달할 수도 있습니다 (예: "eng"). SDK는 ISO 639-3 코드로 37개 언어를 지원합니다:

언어	코드	언어	코드	언어	코드
영어	`eng`	일본어	`jpn`	우크라이나어	`ukr`
한국어	`kor`	그리스어	`ell`	인도네시아어	`ind`
스페인어	`spa`	타밀어	`tam`	덴마크어	`dan`
독일어	`deu`	타갈로그어	`tgl`	스웨덴어	`swe`
프랑스어	`fra`	핀란드어	`fin`	말레이어	`msa`
이탈리아어	`ita`	중국어	`zho`	체코어	`ces`
폴란드어	`pol`	슬로바키아어	`slk`	포르투갈어	`por`
네덜란드어	`nld`	아랍어	`ara`	불가리아어	`bul`
러시아어	`rus`	크로아티아어	`hrv`	루마니아어	`ron`
벵골어	`ben`	힌디어	`hin`	헝가리어	`hun`
민난어	`nan`	노르웨이어	`nor`	펀자브어	`pan`
태국어	`tha`	터키어	`tur`	베트남어	`vie`
광둥어	`yue`

타입 안전한 언어 선택을 위해 LanguageCode enum을 사용하세요:

from typecast.models import TTSRequest, LanguageCode

response = client.text_to_speech(TTSRequest(
    text="Hello",
    model="ssfm-v30",
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    language=LanguageCode.ENG
))

오류 처리

SDK는 다양한 HTTP 상태 코드에 대한 특정 예외를 제공합니다:

from typecast import (
    Typecast,
    TypecastError,
    BadRequestError,
    UnauthorizedError,
    PaymentRequiredError,
    NotFoundError,
    UnprocessableEntityError,
    RateLimitError,
    InternalServerError,
)

try:
    response = client.text_to_speech(request)
except UnauthorizedError:
    print("Invalid API key")
except PaymentRequiredError:
    print("Insufficient credits")
except RateLimitError:
    print("Rate limit exceeded - please try again later")
except TypecastError as e:
    print(f"Error {e.status_code}: {e.message}")

예외	상태 코드	설명
`BadRequestError`	400	잘못된 요청 파라미터
`UnauthorizedError`	401	잘못되거나 누락된 API 키
`PaymentRequiredError`	402	크레딧 부족
`NotFoundError`	404	리소스를 찾을 수 없음
`UnprocessableEntityError`	422	유효성 검사 오류
`RateLimitError`	429	요청 한도 초과
`InternalServerError`	500	서버 오류

Documentation Index

패키지

소스 코드

​설치

​빠른 시작

​기능

​설정

​고급 사용법

​감정 제어 (ssfm-v30)

​음성 조절

​캐릭터 탐색 (V2 API)

​비동기 클라이언트

​스트리밍

​타임스탬프 TTS

​기본 사용법

​Granularity(정렬 단위)

​자막 내보내기

​지원 언어

​오류 처리

설치

빠른 시작

기능

설정

고급 사용법

감정 제어 (ssfm-v30)

음성 조절

캐릭터 탐색 (V2 API)

비동기 클라이언트

스트리밍

타임스탬프 TTS

기본 사용법

Granularity(정렬 단위)

자막 내보내기

지원 언어

오류 처리