Creates a new AzureSpeechSTTProvider.
Provider configuration including the subscription key and region.
const provider = new AzureSpeechSTTProvider({
key: 'your-azure-subscription-key',
region: 'eastus',
});
Transcribes an audio buffer using the Azure Speech recognition REST endpoint.
Sends the raw audio as PCM WAV and returns a normalized result. Azure's
NoMatch status is treated as an empty transcript (not an error).
Raw audio data. Azure expects PCM WAV format; the
Content-Type header is always set to 'audio/wav' regardless of
audio.mimeType.
Optional transcription settings. Only language is
supported by the Azure REST endpoint.
A promise resolving to the normalized transcription result.
When the Azure API returns a non-2xx HTTP status code. The error message includes the status and response body text.
const result = await provider.transcribe(
{ data: wavBuffer, durationSeconds: 5 },
{ language: 'fr-FR' },
);
if (result.text === '') {
console.log('No speech detected in the audio');
}
Readonly idUnique provider identifier used for registration and resolution.
Readonly displayHuman-readable display name for UI and logging.
Readonly supportsThis provider uses synchronous HTTP requests, not WebSocket streaming.
Speech-to-text provider that uses the Azure Cognitive Services Speech REST API.
Azure REST Endpoint Format
The endpoint URL follows this pattern:
{region}— The Azure region from config (e.g.eastus,westeurope).{lang}— BCP-47 language code from options or'en-US'default./conversation/path segment selects the conversation recognition mode (as opposed to/interactive/or/dictation/).Authentication:
Ocp-Apim-Subscription-KeyAzure Cognitive Services uses the
Ocp-Apim-Subscription-KeyHTTP header for authentication, which differs from the typicalAuthorization: Bearerpattern. The subscription key is sent as a plain-text header value — no "Bearer" or "Token" prefix.An alternative is to use a short-lived token from the token endpoint, but this provider uses the simpler key-based approach for reliability.
NoMatch Handling
When Azure's recognizer detects audio but cannot identify any speech, it returns
RecognitionStatus: 'NoMatch'instead of raising an HTTP error. This provider mapsNoMatchto an empty-text result (text: '') withisFinal: true, matching the Azure Speech SDK's behaviour. This prevents the fallback proxy from unnecessarily trying another provider when the audio genuinely contains no speech.Limitations
Content-Typeis hardcoded toaudio/wavregardless of theaudio.mimeTypevalue.See
Example