Pipecat - Typecast Documentation

Pipecat is an open-source framework for building real-time, multimodal AI voice agents. With the Typecast TTS integration, you can add high-quality neural voices with emotion control to your voice AI pipelines.

What is Pipecat?

Pipecat is a Python framework that simplifies building voice AI applications. It connects various services (speech-to-text, LLMs, text-to-speech) into a unified pipeline, handling the complexity of real-time audio streaming, turn-taking, and transport protocols. A typical Pipecat pipeline looks like this:

User Audio → STT → LLM → TTS → Bot Audio

The Typecast TTS service (pipecat-ai-typecast) integrates seamlessly into this pipeline, converting LLM responses into expressive speech.

What You Can Do

With the Typecast Pipecat integration, you can:

Build voice AI agents with natural, expressive voices
Choose from 500+ voices with different genders, ages, and styles
Apply emotions (happy, sad, angry, whisper, and more)
Use Smart Emotion for context-aware voice synthesis
Deploy anywhere — Daily, Twilio, or native WebRTC

Prerequisites

Before you start, make sure you have:

Requirement	Version
Python	3.10+
Pipecat	v0.0.94+
Typecast API Key	Get yours here

Installation

Install the Typecast TTS service for Pipecat:

pip install pipecat-ai-typecast

Using uv? Run uv add pipecat-ai-typecast instead.

Quick Start

Here’s a minimal example of integrating Typecast TTS into a Pipecat pipeline:

import os
import aiohttp
from pipecat.pipeline.pipeline import Pipeline
from pipecat_typecast import TypecastTTSService

async with aiohttp.ClientSession() as session:
    # Initialize Typecast TTS
    tts = TypecastTTSService(
        aiohttp_session=session,
        api_key=os.getenv("TYPECAST_API_KEY"),
        voice_id=os.getenv("TYPECAST_VOICE_ID", "tc_672c5f5ce59fac2a48faeaee"),
    )

    # Build your pipeline
    pipeline = Pipeline([
        transport.input(),               # User audio input
        stt,                             # Speech-to-text
        context_aggregator.user(),       # Add user text to context
        llm,                             # LLM generates response
        tts,                             # Typecast TTS synthesis
        transport.output(),              # Stream audio to user
        context_aggregator.assistant(),  # Store assistant response
    ])

Set your environment variables:

TYPECAST_API_KEY — Your Typecast API key (required)
TYPECAST_VOICE_ID — Voice to use (optional, defaults to a preset voice)

Configuration

The TypecastTTSService supports both preset-based and context-aware emotion control.

Basic Configuration

from pipecat_typecast import TypecastTTSService

tts = TypecastTTSService(
    aiohttp_session=session,
    api_key=os.getenv("TYPECAST_API_KEY"),
    voice_id="tc_672c5f5ce59fac2a48faeaee",
    model="ssfm-v30",  # Latest model (default)
)

Preset Emotion Control

Choose from predefined emotions for consistent voice styling:

from pipecat_typecast import (
    TypecastTTSService,
    TypecastInputParams,
    PresetPromptOptions,
    OutputOptions,
)

params = TypecastInputParams(
    prompt_options=PresetPromptOptions(
        emotion_preset="happy",      # normal | happy | sad | angry | whisper | toneup | tonedown
        emotion_intensity=1.3,       # 0.0 - 2.0
    ),
    output_options=OutputOptions(
        volume=110,                  # 0 - 200 (percent)
        audio_pitch=2,               # -12 to 12 (semitones)
        audio_tempo=1.05,            # 0.5 - 2.0 (playback speed)
    ),
)

tts = TypecastTTSService(
    aiohttp_session=session,
    api_key=os.getenv("TYPECAST_API_KEY"),
    params=params,
)

Smart Emotion (Context-Aware)

Let the AI automatically infer emotion from surrounding text:

from pipecat_typecast import (
    TypecastTTSService,
    TypecastInputParams,
    SmartPromptOptions,
)

params = TypecastInputParams(
    prompt_options=SmartPromptOptions(
        previous_text="I just got the best news ever!",   # max 2000 chars
        next_text="I can't wait to share this with everyone!",
    ),
)

tts = TypecastTTSService(
    aiohttp_session=session,
    api_key=os.getenv("TYPECAST_API_KEY"),
    params=params,
)

Preset Emotion

Manually choose from 7 emotions: Normal, Happy, Sad, Angry, Whisper, Tone Up, Tone Down.Best for consistent voice styling.

Smart Emotion

AI automatically detects the best emotion from text context.Best for natural conversations.

Parameter Reference

Parameter	Range	Description
`emotion_preset`	varies by voice	ssfm-v30: `normal`, `happy`, `sad`, `angry`, `whisper`, `toneup`, `tonedown`
`emotion_intensity`	0.0 - 2.0	Values > 1.0 increase expressiveness
`audio_pitch`	-12 to 12	Semitone adjustment
`audio_tempo`	0.5 - 2.0	Recommended: 0.85 - 1.15
`volume`	0 - 200	Audio volume as percentage
`seed`	uint32	Unsigned integer seed for deterministic synthesis (≥ 0)

Supported Transports

Pipecat supports multiple transport protocols. Typecast works with all of them:

Daily
Twilio
WebRTC

Daily provides WebRTC-based video and audio infrastructure.

from pipecat.transports.daily.transport import DailyParams

transport_params = DailyParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    vad_analyzer=SileroVADAnalyzer(),
)

Twilio enables voice calls over phone networks.

from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams

transport_params = FastAPIWebsocketParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    vad_analyzer=SileroVADAnalyzer(),
)

Native WebRTC for browser-based applications.

from pipecat.transports.base_transport import TransportParams

transport_params = TransportParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    vad_analyzer=SileroVADAnalyzer(),
)

Complete Example

Here’s a full working example that creates a voice AI agent:

import os
import aiohttp
from dotenv import load_dotenv

from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.daily.transport import DailyParams, DailyTransport

from pipecat_typecast import TypecastTTSService

load_dotenv()

async def main():
    async with aiohttp.ClientSession() as session:
        # Initialize services
        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
        tts = TypecastTTSService(
            aiohttp_session=session,
            api_key=os.getenv("TYPECAST_API_KEY"),
        )

        # Set up conversation context
        messages = [
            {
                "role": "system",
                "content": "You are a helpful AI assistant. Keep responses concise.",
            },
        ]
        context = LLMContext(messages)
        context_aggregator = LLMContextAggregatorPair(context)

        # Configure transport
        transport = DailyTransport(
            room_url=os.getenv("DAILY_ROOM_URL"),
            token=os.getenv("DAILY_TOKEN"),
            params=DailyParams(
                audio_in_enabled=True,
                audio_out_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
            ),
        )

        # Build and run pipeline
        pipeline = Pipeline([
            transport.input(),
            stt,
            context_aggregator.user(),
            llm,
            tts,
            transport.output(),
            context_aggregator.assistant(),
        ])

        task = PipelineTask(pipeline, params=PipelineParams())
        runner = PipelineRunner()
        await runner.run(task)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Legacy Model (ssfm-v21)

Using ssfm-v21 for backward compatibility

If you need to use the legacy ssfm-v21 model:

from pipecat_typecast import (
    TypecastTTSService,
    TypecastInputParams,
    PromptOptions,
)

params = TypecastInputParams(
    prompt_options=PromptOptions(
        emotion_preset="happy",      # normal | happy | sad | angry
        emotion_intensity=1.3,
    ),
)

tts = TypecastTTSService(
    aiohttp_session=session,
    api_key=os.getenv("TYPECAST_API_KEY"),
    model="ssfm-v21",
    params=params,
)

Note: ssfm-v21 supports fewer emotion presets (no whisper, toneup, tonedown).

Troubleshooting

API key not found error

Ensure TYPECAST_API_KEY environment variable is set
Verify your key at Typecast API Console
Check for extra spaces in the key

No audio output

Confirm your transport is configured with audio_out_enabled=True
Check that the TTS service is included in your pipeline
Verify your API key has sufficient credits

Audio quality issues

Adjust audio_tempo within the recommended range (0.85 - 1.15)
Try different emotion_intensity values
Ensure sample rate matches your transport configuration

Import errors

Make sure you installed pipecat-ai-typecast, not just pipecat-typecast
Verify Python version is 3.10 or higher
Check that Pipecat version is v0.0.94 or later

Resources

GitHub Repository

Source code and examples

PyPI Package

Install via pip

Pipecat Documentation

Learn more about Pipecat

Voice Library

Browse available voices

​What is Pipecat?

​What You Can Do

​Prerequisites

​Installation

​Quick Start

​Configuration

​Basic Configuration

​Preset Emotion Control

​Smart Emotion (Context-Aware)

Preset Emotion

Smart Emotion

​Parameter Reference

​Supported Transports

​Complete Example

​Legacy Model (ssfm-v21)

​Troubleshooting

​Resources

GitHub Repository

PyPI Package

Pipecat Documentation

Voice Library

What is Pipecat?

What You Can Do

Prerequisites

Installation

Quick Start

Configuration

Basic Configuration

Preset Emotion Control

Smart Emotion (Context-Aware)

Parameter Reference

Supported Transports

Complete Example

Legacy Model (ssfm-v21)

Troubleshooting

Resources