Class AzureSpeechTTSProvider

Text-to-speech provider that uses the Azure Cognitive Services Speech REST API.

SSML Generation

Azure's TTS REST endpoint requires SSML (Speech Synthesis Markup Language) as the request body — it does not accept plain text. This provider generates minimal SSML via buildSsml() that wraps the input text in <speak> and <voice> elements. Special XML characters in the text are escaped via escapeXml() to prevent malformed XML.

`X-Microsoft-OutputFormat` Options

The X-Microsoft-OutputFormat header controls the audio encoding. This provider uses 'audio-24khz-96kbitrate-mono-mp3' which provides:

24 kHz sample rate (high quality for speech)
96 kbps bitrate (good balance of quality and file size)
Mono channel (sufficient for speech synthesis)
MP3 format (universally supported)

Other available formats include:

'audio-16khz-128kbitrate-mono-mp3' — Lower sample rate, higher bitrate
'audio-24khz-160kbitrate-mono-mp3' — Higher bitrate for better quality
'riff-24khz-16bit-mono-pcm' — Uncompressed WAV
'ogg-24khz-16bit-mono-opus' — Opus codec in OGG container

See

https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs

Voice Listing

The listAvailableVoices method fetches the full list of neural voices available in the configured Azure region via GET /cognitiveservices/voices/list. Results are mapped to the normalized SpeechVoice shape.

AzureSpeechTTSProviderConfig for configuration options
AzureSpeechSTTProvider for the corresponding STT provider

Example

const provider = new AzureSpeechTTSProvider({
  key: process.env.AZURE_SPEECH_KEY!,
  region: 'eastus',
  defaultVoice: 'en-US-GuyNeural',
});
const result = await provider.synthesize('Hello world');
// result.audioBuffer contains MP3 bytes
// result.mimeType === 'audio/mpeg'

Implements

TextToSpeechProvider

Index

Constructors

constructor

new AzureSpeechTTSProvider(config): AzureSpeechTTSProvider
Creates a new AzureSpeechTTSProvider.
Parameters
- config: AzureSpeechTTSProviderConfig
  Provider configuration including the subscription key, region, and optional default voice.
Returns AzureSpeechTTSProvider
Example
```
const provider = new AzureSpeechTTSProvider({
  key: 'your-azure-subscription-key',
  region: 'westeurope',
  defaultVoice: 'de-DE-ConradNeural',
});
```
- Defined in src/speech/providers/AzureSpeechTTSProvider.ts:246

Methods

getProviderName

getProviderName(): string
Returns the human-readable provider name.

Returns string
The display name string 'Azure Speech (TTS)'.
Example
```
provider.getProviderName(); // 'Azure Speech (TTS)'
```
Implementation of TextToSpeechProvider.getProviderName
- Defined in src/speech/providers/AzureSpeechTTSProvider.ts:261

synthesize

synthesize(text, options?): Promise<SpeechSynthesisResult>
Synthesizes speech from plain text using the Azure TTS REST endpoint.

The text is wrapped in SSML, sent to Azure, and the response audio buffer (MP3 format) is returned along with metadata.
Parameters
- text: string
  The plain-text utterance to convert to audio. XML special characters are automatically escaped.
- options: SpeechSynthesisOptions = {}
  Optional synthesis settings. Use options.voice to override the default voice with any valid Azure voice short-name.
Returns Promise<SpeechSynthesisResult>
A promise resolving to the MP3 audio buffer and metadata.
Throws
When the Azure API returns a non-2xx status code. Common causes: invalid subscription key (401), region mismatch (404), invalid SSML (400), or quota exceeded (429).

Example
```
const result = await provider.synthesize('Guten Tag!', {
  voice: 'de-DE-ConradNeural',
});
fs.writeFileSync('output.mp3', result.audioBuffer);
```
Implementation of TextToSpeechProvider.synthesize
- Defined in src/speech/providers/AzureSpeechTTSProvider.ts:288

listAvailableVoices

listAvailableVoices(): Promise<SpeechVoice[]>
Retrieves the list of available neural voices from the Azure region.

Fetches from GET /cognitiveservices/voices/list and maps each entry to the normalized SpeechVoice shape. The list includes all neural and standard voices available in the configured region.

Returns Promise<SpeechVoice[]>
A promise resolving to an array of normalized voice entries.
Throws
When the Azure API returns a non-2xx status code (e.g. invalid key, network error).

Example
```
const voices = await provider.listAvailableVoices();
const englishVoices = voices.filter(v => v.lang.startsWith('en-'));
console.log(`Found ${englishVoices.length} English voices`);
```
Implementation of TextToSpeechProvider.listAvailableVoices
- Defined in src/speech/providers/AzureSpeechTTSProvider.ts:353

Properties

`Readonly` id

id: "azure-speech-tts" = 'azure-speech-tts'

Unique provider identifier used for registration and resolution.

`Readonly` displayName

displayName: "Azure Speech (TTS)" = 'Azure Speech (TTS)'

Human-readable display name for UI and logging.

`Readonly` supportsStreaming

supportsStreaming: true = true

Marked as streaming-capable because the provider can be used within a streaming pipeline — though the actual HTTP request is a single synchronous call that returns the complete audio buffer.

Class AzureSpeechTTSProvider

SSML Generation

`X-Microsoft-OutputFormat` Options

See

Voice Listing

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Parameters

Returns AzureSpeechTTSProvider

Example

Methods

getProviderName

Returns string

Example

synthesize

Parameters

Returns Promise<SpeechSynthesisResult>

Throws

Example

listAvailableVoices

Returns Promise<SpeechVoice[]>

Throws

Example

Properties

`Readonly` id

`Readonly` displayName

`Readonly` supportsStreaming

Settings

Member Visibility

Theme

On This Page

Class AzureSpeechTTSProvider

SSML Generation

X-Microsoft-OutputFormat Options

See

Voice Listing

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Parameters

Returns AzureSpeechTTSProvider

Example

Methods

getProviderName

Returns string

Example

synthesize

Parameters

Returns Promise<SpeechSynthesisResult>

Throws

Example

listAvailableVoices

Returns Promise<SpeechVoice[]>

Throws

Example

Properties

Readonly id

Readonly displayName

Readonly supportsStreaming

Settings

Member Visibility

Theme

On This Page

`X-Microsoft-OutputFormat` Options

`Readonly` id

`Readonly` displayName

`Readonly` supportsStreaming