Class OpenAIWhisperSpeechToTextProvider

Speech-to-text provider that uses the OpenAI Whisper transcription API.

API Contract

  • Endpoint: POST {baseUrl}/audio/transcriptions
  • Authentication: Authorization: Bearer <apiKey>
  • Content-Type: multipart/form-data (FormData with file blob)
  • Response format: Controlled by the response_format field; defaults to verbose_json which includes segments, language detection, and duration.

Supported Response Formats

  • verbose_json — Full JSON with segments, duration, and language (default)
  • json — Minimal JSON with just the text
  • text — Plain text response (no JSON)
  • srt — SubRip subtitle format
  • vtt — WebVTT subtitle format

When text, srt, or vtt format is used, the response is returned as plain text and segments are not available.

See

OpenAIWhisperSpeechToTextProviderConfig for configuration options See normalizeSegments() for the segment normalization logic.

Example

const provider = new OpenAIWhisperSpeechToTextProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: 'whisper-1',
});
const result = await provider.transcribe(
{ data: audioBuffer, mimeType: 'audio/wav', fileName: 'recording.wav' },
{ language: 'en', responseFormat: 'verbose_json' },
);

Implements

Constructors

Methods

  • Transcribes an audio buffer using the OpenAI Whisper API.

    The audio is sent as a multipart form upload with the file, model, and optional parameters (language, prompt, temperature, response_format).

    Parameters

    • audio: SpeechAudioInput

      Raw audio data and metadata. The data buffer is wrapped in a Blob and sent as a form file field. If fileName is not provided, a default name is generated from the format field.

    • options: SpeechTranscriptionOptions = {}

      Optional transcription settings including language hint, context prompt, temperature for sampling, and response format.

    Returns Promise<SpeechTranscriptionResult>

    A promise resolving to the normalized transcription result.

    Throws

    When the OpenAI API returns a non-2xx status code.

    Example

    const result = await provider.transcribe(
    { data: mp3Buffer, mimeType: 'audio/mpeg', fileName: 'voice.mp3' },
    { language: 'fr', prompt: 'Discussion about AI' },
    );

Properties

id: "openai-whisper" = 'openai-whisper'

Unique provider identifier used for registration and resolution.

displayName: "OpenAI Whisper" = 'OpenAI Whisper'

Human-readable display name for UI and logging.

supportsStreaming: false = false

Whisper API is batch-only; streaming requires a WebSocket adapter.