Class ElevenLabsTextToSpeechProvider

Text-to-speech provider that uses the ElevenLabs TTS API.

API Contract

  • Endpoint: POST {baseUrl}/text-to-speech/{voiceId}
  • Authentication: xi-api-key: <apiKey> header
  • Content-Type: application/json
  • Accept: audio/mpeg (MP3 response)
  • Request body: { text, model_id, voice_settings: { stability, similarity_boost, style, use_speaker_boost } }
  • Response: Raw MP3 audio bytes

Voice Settings

ElevenLabs exposes fine-grained voice control via voice_settings:

  • stability (0.0–1.0) — Lower values = more expressive/variable, higher = more consistent
  • similarity_boost (0.0–1.0) — Higher values make output more similar to the original voice
  • style (0.0–1.0) — Style exaggeration (optional, only for v2+ models)
  • use_speaker_boost (boolean) — Enhances speaker similarity (default: true)

These can be passed via options.providerSpecificOptions.

Voice ID Resolution

The voice ID is resolved with the following priority:

  1. options.voice (per-call override)
  2. config.voiceId (constructor default)
  3. options.providerSpecificOptions.voiceId (legacy override path)
  4. 'EXAVITQu4vr4xnSDxMaL' (hardcoded fallback — the "Sarah" voice)

Voice Listing

listAvailableVoices fetches the user's voice library from the /voices endpoint and maps each entry to the normalized SpeechVoice shape. Returns an empty array on API errors (graceful degradation).

See

ElevenLabsTextToSpeechProviderConfig for configuration options

Example

const provider = new ElevenLabsTextToSpeechProvider({
apiKey: process.env.ELEVENLABS_API_KEY!,
voiceId: 'pNInz6obpgDQGcFmaJgB', // "Adam"
});
const result = await provider.synthesize('Hello world', {
providerSpecificOptions: { stability: 0.7, similarityBoost: 0.8 },
});

Implements

Constructors

Methods

  • Synthesizes speech from text using the ElevenLabs TTS API.

    Parameters

    • text: string

      The text to convert to audio.

    • options: SpeechSynthesisOptions = {}

      Optional synthesis settings. Use providerSpecificOptions to control ElevenLabs-specific voice settings (stability, similarityBoost, style, useSpeakerBoost).

    Returns Promise<SpeechSynthesisResult>

    A promise resolving to the MP3 audio buffer and metadata.

    Throws

    When the ElevenLabs API returns a non-2xx status code. Common causes: invalid API key (401), voice not found (404), character limit exceeded (400), or rate limit (429).

    Example

    const result = await provider.synthesize('Hello there!', {
    voice: 'pNInz6obpgDQGcFmaJgB',
    providerSpecificOptions: {
    stability: 0.3, // More expressive
    similarityBoost: 0.9, // Closer to original voice
    style: 0.5, // Moderate style exaggeration
    },
    });
  • Fetches the user's voice library from the ElevenLabs API.

    Returns available voices mapped to the normalized SpeechVoice shape. Gracefully returns an empty array on API errors (e.g. network failure, invalid key) to avoid breaking voice selection UIs.

    The voice library includes both ElevenLabs' pre-made voices and any custom/cloned voices in the user's account.

    Returns Promise<SpeechVoice[]>

    A promise resolving to an array of available voices, or an empty array if the API call fails.

    Example

    const voices = await provider.listAvailableVoices();
    const rachel = voices.find(v => v.name === 'Rachel');

Properties

id: "elevenlabs" = 'elevenlabs'

Unique provider identifier used for registration and resolution.

displayName: "ElevenLabs" = 'ElevenLabs'

Human-readable display name for UI and logging.

supportsStreaming: true = true

Streaming is supported — ElevenLabs offers a WebSocket streaming endpoint, and even the REST endpoint can be consumed as a stream.