Creates a new AzureSpeechTTSProvider.
Provider configuration including the subscription key, region, and optional default voice.
const provider = new AzureSpeechTTSProvider({
key: 'your-azure-subscription-key',
region: 'westeurope',
defaultVoice: 'de-DE-ConradNeural',
});
Synthesizes speech from plain text using the Azure TTS REST endpoint.
The text is wrapped in SSML, sent to Azure, and the response audio buffer (MP3 format) is returned along with metadata.
The plain-text utterance to convert to audio. XML special characters are automatically escaped.
Optional synthesis settings. Use options.voice to
override the default voice with any valid Azure voice short-name.
A promise resolving to the MP3 audio buffer and metadata.
When the Azure API returns a non-2xx status code. Common causes: invalid subscription key (401), region mismatch (404), invalid SSML (400), or quota exceeded (429).
const result = await provider.synthesize('Guten Tag!', {
voice: 'de-DE-ConradNeural',
});
fs.writeFileSync('output.mp3', result.audioBuffer);
Retrieves the list of available neural voices from the Azure region.
Fetches from GET /cognitiveservices/voices/list and maps each entry
to the normalized SpeechVoice shape. The list includes all
neural and standard voices available in the configured region.
A promise resolving to an array of normalized voice entries.
When the Azure API returns a non-2xx status code (e.g. invalid key, network error).
const voices = await provider.listAvailableVoices();
const englishVoices = voices.filter(v => v.lang.startsWith('en-'));
console.log(`Found ${englishVoices.length} English voices`);
Readonly idUnique provider identifier used for registration and resolution.
Readonly displayHuman-readable display name for UI and logging.
Readonly supportsMarked as streaming-capable because the provider can be used within a streaming pipeline — though the actual HTTP request is a single synchronous call that returns the complete audio buffer.
Text-to-speech provider that uses the Azure Cognitive Services Speech REST API.
SSML Generation
Azure's TTS REST endpoint requires SSML (Speech Synthesis Markup Language) as the request body — it does not accept plain text. This provider generates minimal SSML via
buildSsml()that wraps the input text in<speak>and<voice>elements. Special XML characters in the text are escaped viaescapeXml()to prevent malformed XML.X-Microsoft-OutputFormatOptionsThe
X-Microsoft-OutputFormatheader controls the audio encoding. This provider uses'audio-24khz-96kbitrate-mono-mp3'which provides:Other available formats include:
'audio-16khz-128kbitrate-mono-mp3'— Lower sample rate, higher bitrate'audio-24khz-160kbitrate-mono-mp3'— Higher bitrate for better quality'riff-24khz-16bit-mono-pcm'— Uncompressed WAV'ogg-24khz-16bit-mono-opus'— Opus codec in OGG containerSee
Voice Listing
The listAvailableVoices method fetches the full list of neural voices available in the configured Azure region via
GET /cognitiveservices/voices/list. Results are mapped to the normalized SpeechVoice shape.Example