Overview

Smart Turn Detection uses an advanced machine learning model to determine when a user has finished speaking and your bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection recognizes natural conversational cues like intonation patterns and linguistic signals for more natural conversations.

On Pipecat Cloud, Smart Turn Detection is powered by Fal.ai’s hosted smart-turn model, providing scalable inference without any setup required.

Key Benefits

  • Natural conversations: More human-like turn-taking patterns
  • Zero setup: A Fal API key is automatically provisioned for you
  • Free to use: Included at no additional cost
  • Scalable: Powered by Fal.ai’s cloud infrastructure

Quick Start

To enable Smart Turn Detection in your Pipecat Cloud bot, add the FalSmartTurnAnalyzer to your transport configuration.

Use an environment variable of FAL_API_KEY which automatically receives a Fal API key at runtime when deployed to Pipecat Cloud.

import os
import aiohttp
from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.transports.services.daily import DailyParams, DailyTransport

async def main(room_url: str, token: str):
    async with aiohttp.ClientSession() as session:
        transport = DailyTransport(
            room_url,
            token,
            "Voice AI Bot",
            DailyParams(
                audio_in_enabled=True,
                audio_out_enabled=True,
                # Set VAD to 0.2 seconds for optimal Smart Turn performance
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
                # Enable Smart Turn Detection using FAL_API_KEY, which is automatically provisioned
                turn_analyzer=FalSmartTurnAnalyzer(
                    api_key=os.getenv("FAL_API_KEY"), aiohttp_session=session
                ),
            ),
        )

        # Continue with your pipeline setup...

Smart Turn Detection requires VAD to be enabled with stop_secs=0.2. This value mimics the training data and allows Smart Turn to dynamically adjust timing based on the model’s predictions.

How It Works

  1. Audio Analysis: The system continuously analyzes incoming audio for speech patterns
  2. VAD Processing: Voice Activity Detection segments audio into speech and silence
  3. Turn Classification: When VAD detects a pause, the ML model analyzes the speech segment for natural completion cues
  4. Smart Response: The model determines if the turn is complete or if the user is likely to continue speaking

Training Data Collection

The smart-turn model is trained on real conversational data collected through these applications. Help us improve the model by contributing your own data or classifying existing data:

Deployment Requirements

  • Automatic API key provisioning: When deployed to Pipecat Cloud, a Fal API key is automatically provided via the FAL_API_KEY environment variable
  • Manual setup for other deployments: You can use Smart Turn Detection locally or on other platforms by obtaining your own Fal API key from fal.ai
  • Internet connectivity: Requires connection to Fal.ai’s inference servers

On Pipecat Cloud, the FAL_API_KEY environment variable is automatically provided at no cost. For local development or other deployment platforms, you’ll need to sign up for your own Fal account and API key.

Performance Notes

  • The model is optimized for English conversations
  • Network latency may vary based on geographic location
  • Fallback VAD-based turn detection if the service is unavailable
  • Best results with clear audio input and minimal background noise