Creates a new OpenAIWhisperSpeechToTextProvider.
Provider configuration including API key and optional defaults.
const provider = new OpenAIWhisperSpeechToTextProvider({
apiKey: 'sk-xxxx',
baseUrl: 'https://api.openai.com/v1', // default
model: 'whisper-1', // default
});
Transcribes an audio buffer using the OpenAI Whisper API.
The audio is sent as a multipart form upload with the file, model, and optional parameters (language, prompt, temperature, response_format).
Raw audio data and metadata. The data buffer is wrapped
in a Blob and sent as a form file field. If fileName is not provided,
a default name is generated from the format field.
Optional transcription settings including language hint, context prompt, temperature for sampling, and response format.
A promise resolving to the normalized transcription result.
When the OpenAI API returns a non-2xx status code.
const result = await provider.transcribe(
{ data: mp3Buffer, mimeType: 'audio/mpeg', fileName: 'voice.mp3' },
{ language: 'fr', prompt: 'Discussion about AI' },
);
Readonly idUnique provider identifier used for registration and resolution.
Readonly displayHuman-readable display name for UI and logging.
Readonly supportsWhisper API is batch-only; streaming requires a WebSocket adapter.
Speech-to-text provider that uses the OpenAI Whisper transcription API.
API Contract
POST {baseUrl}/audio/transcriptionsAuthorization: Bearer <apiKey>multipart/form-data(FormData with file blob)response_formatfield; defaults toverbose_jsonwhich includes segments, language detection, and duration.Supported Response Formats
verbose_json— Full JSON with segments, duration, and language (default)json— Minimal JSON with just the texttext— Plain text response (no JSON)srt— SubRip subtitle formatvtt— WebVTT subtitle formatWhen
text,srt, orvttformat is used, the response is returned as plain text and segments are not available.See
OpenAIWhisperSpeechToTextProviderConfig for configuration options See
normalizeSegments()for the segment normalization logic.Example