# TTS-MCP-SERVER Configuration and Usage

This document covers the configuration and usage patterns for TTS-MCP-SERVER integration with ElevenLabs.

## MCP Server Configuration

### Claude Desktop / Claude Code Configuration

Add to your MCP configuration file:

```json
{
  "mcpServers": {
    "tts-mcp-server": {
      "command": "uvx",
      "args": ["tts-mcp-server"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key-here"
      }
    }
  }
}
```

### Alternative: ElevenLabs MCP Server

```json
{
  "mcpServers": {
    "elevenlabs": {
      "command": "uv",
      "args": [
        "--directory",
        "path/to/elevenlabs-mcp-server",
        "run",
        "elevenlabs-mcp-server"
      ],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key",
        "ELEVENLABS_VOICE_ID": "DpalF6dOkkUMR5KCm1VO",
        "ELEVENLABS_MODEL_ID": "eleven_v3",
        "ELEVENLABS_STABILITY": "0.5",
        "ELEVENLABS_SIMILARITY_BOOST": "0.75",
        "ELEVENLABS_STYLE": "0.1",
        "ELEVENLABS_OUTPUT_DIR": "output"
      }
    }
  }
}
```

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `ELEVENLABS_API_KEY` | Your ElevenLabs API key | Required |
| `ELEVENLABS_VOICE_ID` | Default voice ID | Library default |
| `ELEVENLABS_MODEL_ID` | Model to use | eleven_v3 |
| `ELEVENLABS_STABILITY` | Voice stability (0.0-1.0) | 0.5 |
| `ELEVENLABS_SIMILARITY_BOOST` | Clone similarity (0.0-1.0) | 0.75 |
| `ELEVENLABS_STYLE` | Style exaggeration (0.0-1.0) | 0.1 |
| `ELEVENLABS_OUTPUT_DIR` | Output directory for files | ./output |
| `ELEVENLABS_MCP_OUTPUT_MODE` | Output mode: files/resources/both | files |

---

## speak_text Tool Usage

The primary tool for TTS operations.

### Basic Usage

```python
# Simple text-to-speech
speak_text(
    text="Hello, world!",
    service="elevenlabs",
    model="eleven_v3"
)
```

### With Voice Selection

```python
speak_text(
    text="[whispers] Come here. Closer.",
    service="elevenlabs",
    model="eleven_v3",
    voice_id="DpalF6dOkkUMR5KCm1VO"  # Liza Bonnet
)
```

### With Audio Tags

```python
speak_text(
    text="[serious] Break time is over. [sighs] Back to work.",
    service="elevenlabs",
    model="eleven_v3",
    voice_id="I7JbV36JNTnseIKpKfyG"  # Charlise Therin
)
```

### Non-Blocking Mode

For parallel work, use non-blocking mode:

```python
speak_text(
    text="Task complete. Taking a quick break.",
    service="elevenlabs",
    model="eleven_v3",
    wait_for_response=False  # Continue without waiting
)
```

---

## Voice Settings Configuration

### Stability Settings

| Setting | Value | Effect |
|---------|-------|--------|
| Creative | 0.1-0.3 | Highly expressive, more tag-responsive, may hallucinate |
| Natural | 0.4-0.6 | Balanced expression and consistency |
| Robust | 0.7-1.0 | Very consistent, less responsive to emotional tags |

### Similarity Boost

| Setting | Value | Effect |
|---------|-------|--------|
| Low | 0.0-0.3 | More creative interpretation |
| Medium | 0.4-0.7 | Balanced voice matching |
| High | 0.8-1.0 | Closest to original voice sample |

### Style Exaggeration

| Setting | Value | Effect |
|---------|-------|--------|
| Minimal | 0.0-0.2 | Subtle style application |
| Moderate | 0.3-0.5 | Noticeable style emphasis |
| Strong | 0.6-1.0 | Dramatic style exaggeration |

---

## Model Selection

### eleven_v3 (Recommended)
- Most advanced model
- Best audio tag support
- Highest emotional expressiveness
- Multi-speaker dialogue support
- Note: Works best with IVC voices, not PVC

### eleven_multilingual_v2
- Multi-language support
- Good for international content
- Less tag expressiveness than v3

### eleven_flash_v2_5
- Fastest generation
- Lower latency
- Reduced quality vs v3
- Limited tag support

---

## Integration Patterns

### Priority Order for Voice Output

1. `tts-mcp-server:speak_text` with service="elevenlabs", model="eleven_v3"
2. `voice-mode-docker:converse` with wait_for_response=false
3. `voice-mode:converse` with wait_for_response=false
4. Text fallback (no voice)

### Announcement Pattern

For workflow updates and progress announcements:

```python
# Announce progress (non-blocking)
speak_text(
    text="[excited] Phase one complete! Moving to phase two.",
    service="elevenlabs",
    model="eleven_v3",
    voice_id="DpalF6dOkkUMR5KCm1VO",
    wait_for_response=False
)
# Continue with work...
```

### Error Handling Pattern

```python
try:
    speak_text(text="Operation successful.", ...)
except Exception as e:
    # Fall back to text output
    print("Voice output failed, continuing in text mode")
```

---

## Voice Rotation Strategy

To prevent habituation and maintain engagement:

### Rotation Implementation

```python
VOICES = {
    "liza": "DpalF6dOkkUMR5KCm1VO",
    "charlise": "I7JbV36JNTnseIKpKfyG"
}

voice_counter = 0

def get_next_voice():
    global voice_counter
    voices = list(VOICES.values())
    voice = voices[voice_counter % len(voices)]
    voice_counter += 1
    if voice_counter >= 3:  # Rotate every 3 uses
        voice_counter = 0
    return voice
```

### Session-Based Rotation

```python
# Use each voice 3x before switching
SESSION_ROTATION = [
    ("DpalF6dOkkUMR5KCm1VO", 3),  # Liza Bonnet x3
    ("I7JbV36JNTnseIKpKfyG", 3),  # Charlise Therin x3
]
```

---

## When NOT to Use Voice Output

- User explicitly requests text-only
- Content is code or technical tables
- Response is a trivial acknowledgment
- Voice servers are failing
- High-frequency updates (batch to single announcement)
- Sensitive/private information

---

## Latency Optimization

### Pre-generated Audio
For frequently used messages (timers, alerts), pre-generate audio files and play locally for instant response.

### Streaming Mode
Use streaming endpoints for longer content to reduce perceived latency.

### Chunking Strategy
For long text, break into natural segments for progressive playback.

---

## API Endpoints Reference

### Text-to-Speech
```
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
```

### Text-to-Speech with Streaming
```
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream
```

### Voice Information
```
GET https://api.elevenlabs.io/v1/voices
GET https://api.elevenlabs.io/v1/voices/{voice_id}
```

### Zero Retention Mode
```
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?enable_logging=false
```
