Go - Typecast Documentation

타입캐스트 API를 위한 공식 Go 라이브러리입니다. AI 기반 음성을 사용하여 텍스트를 생동감 있는 음성으로 변환하세요. Go 1.21 이상 버전과 호환됩니다. 외부 의존성 없이 Go 표준 라이브러리만 사용합니다.

Go Reference

Typecast Go SDK

소스 코드

Typecast Go SDK 소스 코드

설치

go get github.com/neosapience/typecast-sdk/typecast-go

Go 1.21 이상이 설치되어 있는지 확인하세요. go version 명령으로 버전을 확인할 수 있습니다.

빠른 시작

package main

import (
    "context"
    "os"

    typecast "github.com/neosapience/typecast-sdk/typecast-go"
)

func main() {
    // 클라이언트 초기화
    client := typecast.NewClient(&typecast.ClientConfig{
        APIKey: "YOUR_API_KEY",
    })

    ctx := context.Background()

    // 텍스트를 음성으로 변환
    response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
        VoiceID: "tc_672c5f5ce59fac2a48faeaee",
        Text:    "안녕하세요! 저는 텍스트 음성 변환 에이전트입니다.",
        Model:   typecast.ModelSSFMV30,
    })
    if err != nil {
        panic(err)
    }

    // 오디오 파일 저장
    os.WriteFile("output.wav", response.AudioData, 0644)
    
    println("Audio saved! Format:", string(response.Format))
}

기능

Typecast Go SDK는 텍스트 음성 변환을 위한 강력한 기능을 제공합니다:

다중 음성 모델: ssfm-v30(최신) 및 ssfm-v21 AI 음성 모델 지원
다국어 지원: 영어, 한국어, 스페인어, 일본어, 중국어 등 37개 언어 지원
감정 제어: 이모션 프리셋(normal, happy, sad, angry, whisper, toneup, tonedown) 또는 스마트 문맥 인식 추론
오디오 커스터마이징: 라우드니스(LUFS -70 to 0), 피치(-12 to +12 반음), 템포(0.5x to 2.0x), 형식(WAV/MP3) 제어
음성 검색: 모델, 성별, 나이, 사용 사례별 필터링이 가능한 V2 Voices API
Context 지원: 취소 및 타임아웃을 위한 context.Context 완전 지원
타임스탬프 TTS: 자막, 가라오케, 립싱크를 위한 단어·문자 단위 정렬 데이터
스트리밍: 저지연 재생을 위한 실시간 청크 오디오 전송
의존성 없음: Go 표준 라이브러리만 사용

설정

환경 변수 또는 직접 전달로 API 키를 설정하세요:

import typecast "github.com/neosapience/typecast-sdk/typecast-go"

// 환경 변수 사용 (권장)
// export TYPECAST_API_KEY="your-api-key-here"
client := typecast.NewClient(nil)

// 또는 직접 전달
client := typecast.NewClient(&typecast.ClientConfig{
    APIKey: "your-api-key-here",
})

// 사용자 정의 설정
client := typecast.NewClient(&typecast.ClientConfig{
    APIKey:  "your-api-key-here",
    BaseURL: "https://api.typecast.ai",  // 선택사항
    Timeout: 60 * time.Second,           // 선택사항
})

환경 변수

변수	설명
`TYPECAST_API_KEY`	타입캐스트 API 키
`TYPECAST_API_HOST`	사용자 정의 API 기본 URL (선택사항)

고급 사용법

감정 제어 (ssfm-v30)

ssfm-v30은 두 가지 감정 제어 모드를 제공합니다: 프리셋과 스마트.

스마트 모드
프리셋 모드

AI가 문맥에서 감정을 추론하도록 합니다:

response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "모든 것이 잘 될 거예요.",
    Model:   typecast.ModelSSFMV30,
    Prompt: &typecast.SmartPrompt{
        EmotionType:  "smart",
        PreviousText: "방금 최고의 소식을 들었어요!",  // 선택적 문맥
        NextText:     "축하하고 싶어요!",             // 선택적 문맥
    },
})

프리셋 값으로 감정을 명시적으로 설정합니다:

intensity := 1.5

response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "이 기능들을 보여드리게 되어 정말 신나요!",
    Model:   typecast.ModelSSFMV30,
    Prompt: &typecast.PresetPrompt{
        EmotionType:      "preset",
        EmotionPreset:    typecast.EmotionHappy,  // normal, happy, sad, angry, whisper, toneup, tonedown
        EmotionIntensity: &intensity,             // 범위: 0.0 ~ 2.0
    },
})

오디오 커스터마이징

라우드니스, 피치, 템포, 출력 형식을 제어합니다:

lufs := -14.0
pitch := 2
tempo := 1.2
seed := uint32(42)

response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "커스터마이즈된 오디오 출력!",
    Model:   typecast.ModelSSFMV30,
    Output: &typecast.Output{
        TargetLUFS:  &lufs,                     // 범위: -70 ~ 0 (LUFS)
        AudioPitch:  &pitch,                    // 범위: -12 to +12 반음
        AudioTempo:  &tempo,                    // 범위: 0.5x to 2.0x
        AudioFormat: typecast.AudioFormatMP3,  // 옵션: WAV, MP3
    },
    Seed: &seed,  // 재현 가능한 결과를 위해
})

os.WriteFile("output.mp3", response.AudioData, 0644)
fmt.Printf("Format: %s\n", response.Format)

음성 검색 (V2 API)

향상된 메타데이터로 사용 가능한 음성을 나열하고 필터링합니다:

// 모든 음성 가져오기
voices, err := client.GetVoicesV2(ctx, nil)

// 조건별 필터링
voices, err := client.GetVoicesV2(ctx, &typecast.VoicesV2Filter{
    Model:  typecast.ModelSSFMV30,
    Gender: typecast.GenderFemale,
    Age:    typecast.AgeYoungAdult,
})

// 음성 정보 표시
for _, voice := range voices {
    fmt.Printf("ID: %s, Name: %s\n", voice.VoiceID, voice.VoiceName)

    if voice.Gender != nil {
        fmt.Printf("Gender: %s\n", *voice.Gender)
    }
    if voice.Age != nil {
        fmt.Printf("Age: %s\n", *voice.Age)
    }

    for _, model := range voice.Models {
        fmt.Printf("Model: %s, Emotions: %v\n", model.Version, model.Emotions)
    }
}

// 특정 음성 상세 정보 가져오기
voice, err := client.GetVoiceV2(ctx, "tc_672c5f5ce59fac2a48faeaee")

다국어 콘텐츠

SDK는 자동 언어 감지를 통해 37개 언어를 지원합니다:

// 언어 자동 감지 (권장)
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID: "tc_672c5f5ce59fac2a48faeaee",
    Text:    "こんにちは。お元気ですか。",
    Model:   typecast.ModelSSFMV30,
})

// 또는 언어 명시적으로 지정
response, err := client.TextToSpeech(ctx, &typecast.TTSRequest{
    VoiceID:  "tc_672c5f5ce59fac2a48faeaee",
    Text:     "안녕하세요. 반갑습니다.",
    Model:    typecast.ModelSSFMV30,
    Language: "kor",  // ISO 639-3 언어 코드
})

스트리밍

저지연 재생을 위한 실시간 오디오 청크 스트리밍:

// 원시 PCM 추출 (44바이트 WAV 헤더 건너뛰기)
reader, _ := client.TextToSpeechStream(context.Background(), request)
defer reader.Close()

buf := make([]byte, 4096)
first := true
for {
    n, err := reader.Read(buf)
    if n > 0 {
        data := buf[:n]
        if first {
            data = data[44:]  // WAV 헤더 건너뛰기
            first = false
        }
        // data는 32000 Hz 16비트 모노 원시 PCM
        // 오디오 출력으로 전달 (예: oto, portaudio)
        _ = data
    }
    if err != nil {
        break
    }
}

WAV 스트리밍 형식: 32000 Hz, 16비트, 모노 PCM. 첫 번째 청크에 44바이트 WAV 헤더(size = 0xFFFFFFFF)가 포함되며, 이후 청크는 원시 PCM 데이터만 포함합니다. MP3 형식: 320 kbps, 44100 Hz, 각 청크는 독립적으로 디코딩 가능합니다. 스트리밍 엔드포인트는 Volume 및 TargetLUFS를 지원하지 않습니다.

타임스탬프 TTS

TextToSpeechWithTimestamps()는 POST /v1/text-to-speech/with-timestamps를 래핑하며, 오디오와 함께 단어·문자 단위 정렬 데이터를 반환합니다. 가라오케 하이라이트, 자막 생성, 립싱크 애플리케이션에 활용할 수 있습니다.

기본 사용법

package main

import (
    "context"
    "fmt"
    "os"

    typecast "github.com/neosapience/typecast-sdk/typecast-go"
)

func main() {
    client := typecast.NewClient(&typecast.ClientConfig{APIKey: "YOUR_API_KEY"})
    ctx := context.Background()

    result, err := client.TextToSpeechWithTimestamps(ctx, &typecast.TTSRequestWithTimestamps{
        VoiceID: "tc_60e5426de8b95f1d3000d7b5",
        Text:    "Hello. How are you?",
        Model:   typecast.ModelSSFMV30,
    })
    if err != nil {
        panic(err)
    }

    os.WriteFile("output.wav", result.AudioBytes(), 0644)
    fmt.Printf("재생 시간: %.3f초\n", result.AudioDuration)

    for _, w := range result.Words {
        fmt.Printf("  [%.3fs – %.3fs] %s\n", w.StartTime, w.EndTime, w.Text)
    }
}

정밀도(Granularity) 설정

Granularity: typecast.GranularityWord(기본값) 또는 Granularity: typecast.GranularityChar를 전달해 정렬 단위를 제어합니다.

// 문자 단위 정렬 — 일본어·중국어에 필수
result, err := client.TextToSpeechWithTimestamps(ctx, &typecast.TTSRequestWithTimestamps{
    VoiceID:     "tc_60e5426de8b95f1d3000d7b5",
    Text:        "Hello. How are you?",
    Model:       typecast.ModelSSFMV30,
    Granularity: typecast.GranularityChar,
})

자막 내보내기

srt, _ := result.ToSrt()
os.WriteFile("output.srt", []byte(srt), 0644)

vtt, _ := result.ToVtt()
os.WriteFile("output.vtt", []byte(vtt), 0644)

일본어·중국어: 공백 구분자가 없는 언어(jpn, zho)는 단어 단위 세그먼트가 의미를 갖지 않습니다. 해당 언어에는 GranularityChar를 사용해 문자 단위 정렬 데이터를 얻으세요.

지원 언어

SDK는 자동 언어 감지를 통해 37개 언어를 지원합니다:

코드	언어	코드	언어	코드	언어
`eng`	영어	`jpn`	일본어	`ukr`	우크라이나어
`kor`	한국어	`ell`	그리스어	`ind`	인도네시아어
`spa`	스페인어	`tam`	타밀어	`dan`	덴마크어
`deu`	독일어	`tgl`	타갈로그어	`swe`	스웨덴어
`fra`	프랑스어	`fin`	핀란드어	`msa`	말레이어
`ita`	이탈리아어	`zho`	중국어	`ces`	체코어
`pol`	폴란드어	`slk`	슬로바키아어	`por`	포르투갈어
`nld`	네덜란드어	`ara`	아랍어	`bul`	불가리아어
`rus`	러시아어	`hrv`	크로아티아어	`ron`	루마니아어
`ben`	벵골어	`hin`	힌디어	`hun`	헝가리어
`nan`	민난어	`nor`	노르웨이어	`pan`	펀자브어
`tha`	태국어	`tur`	터키어	`vie`	베트남어
`yue`	광동어

지정하지 않으면 입력 텍스트에서 자동으로 언어가 감지됩니다.

오류 처리

SDK는 특정 오류를 처리하기 위한 헬퍼 메서드가 있는 APIError 타입을 제공합니다:

import typecast "github.com/neosapience/typecast-sdk/typecast-go"

response, err := client.TextToSpeech(ctx, request)
if err != nil {
    if apiErr, ok := err.(*typecast.APIError); ok {
        fmt.Printf("Error %d: %s\n", apiErr.StatusCode, apiErr.Message)
        
        // 특정 오류 처리
        switch {
        case apiErr.IsUnauthorized():
            // 401: 잘못된 API 키
        case apiErr.IsForbidden():
            // 403: 접근 거부
        case apiErr.IsPaymentRequired():
            // 402: 크레딧 부족
        case apiErr.IsNotFound():
            // 404: 리소스를 찾을 수 없음
        case apiErr.IsValidationError():
            // 422: 유효성 검사 오류
        case apiErr.IsRateLimited():
            // 429: 요청 제한 초과
        case apiErr.IsServerError():
            // 5xx: 서버 오류
        case apiErr.IsBadRequest():
            // 400: 잘못된 요청
        }
    }
}

오류 유형

메서드	상태 코드	설명
`IsBadRequest()`	400	잘못된 요청 매개변수
`IsUnauthorized()`	401	잘못되거나 누락된 API 키
`IsPaymentRequired()`	402	크레딧 부족
`IsForbidden()`	403	접근 거부
`IsNotFound()`	404	리소스를 찾을 수 없음
`IsValidationError()`	422	유효성 검사 오류
`IsRateLimited()`	429	요청 제한 초과
`IsServerError()`	5xx	서버 오류

Context와 타임아웃

SDK는 취소 및 타임아웃을 위한 Go의 context.Context를 완전히 지원합니다:

// 타임아웃 사용
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

response, err := client.TextToSpeech(ctx, request)
if err != nil {
    if ctx.Err() == context.DeadlineExceeded {
        fmt.Println("Request timed out")
    }
}

// 취소 사용
ctx, cancel := context.WithCancel(context.Background())
go func() {
    time.Sleep(5 * time.Second)
    cancel()  // 5초 후 취소
}()

response, err := client.TextToSpeech(ctx, request)

API 레퍼런스

클라이언트 메서드

메서드	설명
`TextToSpeech(ctx, request)`	텍스트를 음성 오디오로 변환
`GetVoicesV2(ctx, filter)`	필터링으로 사용 가능한 음성 가져오기
`GetVoiceV2(ctx, voiceID)`	ID로 특정 음성 가져오기
`GetVoices(ctx, model)`	음성 가져오기 (V1 API, 권장하지 않음)
`GetVoice(ctx, voiceID, model)`	음성 가져오기 (V1 API, 권장하지 않음)

TTSRequest 필드

필드	타입	필수	설명
`VoiceID`	`string`	✓	음성 ID (형식: `tc_` 또는 `uc_`)
`Text`	`string`	✓	합성할 텍스트 (최대 2000자)
`Model`	`TTSModel`	✓	TTS 모델 (`ModelSSFMV21` 또는 `ModelSSFMV30`)
`Language`	`string`		ISO 639-3 코드 (생략 시 자동 감지)
`Prompt`	`Prompt` / `PresetPrompt` / `*SmartPrompt`		감정 설정
`Output`	`*Output`		오디오 출력 설정
`Seed`	`*uint32`		재현성을 위한 부호 없는 정수 시드 (≥ 0)

TTSResponse 필드

필드	타입	설명
`AudioData`	`[]byte`	생성된 오디오 데이터
`Duration`	`float64`	오디오 길이 (초)
`Format`	`AudioFormat`	오디오 형식 (`wav` 또는 `mp3`)

상수

모델

상수	값	설명
`ModelSSFMV30`	`ssfm-v30`	향상된 운율의 최신 모델
`ModelSSFMV21`	`ssfm-v21`	안정적인 프로덕션 모델

감정 프리셋

상수	ssfm-v21	ssfm-v30
`EmotionNormal`	✓	✓
`EmotionHappy`	✓	✓
`EmotionSad`	✓	✓
`EmotionAngry`	✓	✓
`EmotionWhisper`	✗	✓
`EmotionToneUp`	✗	✓
`EmotionToneDown`	✗	✓

오디오 형식

상수	값	설명
`AudioFormatWAV`	`wav`	비압축 PCM 오디오
`AudioFormatMP3`	`mp3`	압축된 MP3 오디오

Documentation Index

Go Reference

소스 코드

​설치

​빠른 시작

​기능

​설정

​환경 변수

​고급 사용법

​감정 제어 (ssfm-v30)

​오디오 커스터마이징

​음성 검색 (V2 API)

​다국어 콘텐츠

​스트리밍

​타임스탬프 TTS

​기본 사용법

​정밀도(Granularity) 설정

​자막 내보내기

​지원 언어

​오류 처리

​오류 유형

​Context와 타임아웃

​API 레퍼런스

​클라이언트 메서드

​TTSRequest 필드

​TTSResponse 필드

​상수

​모델

​감정 프리셋

​오디오 형식

설치

빠른 시작

기능

설정

환경 변수

고급 사용법

감정 제어 (ssfm-v30)

오디오 커스터마이징

음성 검색 (V2 API)

다국어 콘텐츠

스트리밍

타임스탬프 TTS

기본 사용법

정밀도(Granularity) 설정

자막 내보내기

지원 언어

오류 처리

오류 유형

Context와 타임아웃

API 레퍼런스

클라이언트 메서드

TTSRequest 필드

TTSResponse 필드

상수

모델

감정 프리셋

오디오 형식