Class DeepgramBatchSTTProvider

Speech-to-text provider that uses the Deepgram batch (pre-recorded) REST API.

REST API Contract

  • Endpoint: POST https://api.deepgram.com/v1/listen
  • Authentication: Authorization: Token <apiKey> header
  • Content-Type: Set to the audio's MIME type (e.g. audio/wav)
  • Body: Raw audio bytes sent directly (no multipart form)
  • Query parameters: model, punctuate, diarize, language
  • Response: JSON containing results.channels[].alternatives[] with transcript text, confidence scores, and optional word-level timing

Word-Level Diarization Mapping

When enableSpeakerDiarization is true, the diarize=true query parameter is set. Deepgram then includes a speaker field (zero-based integer index) on each word in the response. These speaker indices are preserved through the wordsToSegments() mapping into the normalized result.

Error Handling

Non-2xx responses from Deepgram trigger an Error with the HTTP status code and response body text included in the message for debugging. Network-level errors (DNS failures, timeouts) propagate as-is from the fetch implementation.

Streaming is NOT supported by this provider — use a Deepgram WebSocket adapter for real-time transcription.

See

DeepgramBatchSTTProviderConfig for configuration options See wordsToSegments() for the word-to-segment mapping logic.

Example

const provider = new DeepgramBatchSTTProvider({
apiKey: process.env.DEEPGRAM_API_KEY!,
model: 'nova-2',
});
const result = await provider.transcribe(
{ data: audioBuffer, mimeType: 'audio/wav' },
{ enableSpeakerDiarization: true },
);
console.log(result.text);
console.log(result.segments?.map(s => `[Speaker ${s.speaker}] ${s.text}`));

Implements

Constructors

Methods

  • Transcribes an audio buffer using the Deepgram pre-recorded API.

    Sends the raw audio bytes as the request body (not multipart form) with the appropriate Content-Type header. The response is parsed and normalized into a SpeechTranscriptionResult.

    Parameters

    • audio: SpeechAudioInput

      Raw audio data and associated metadata (buffer, MIME type, duration). The data buffer is sent directly as the request body.

    • options: SpeechTranscriptionOptions = {}

      Optional transcription settings. Supports model, language, and enableSpeakerDiarization overrides.

    Returns Promise<SpeechTranscriptionResult>

    A promise resolving to the normalized transcription result with text, confidence, timing, and optional speaker-attributed segments.

    Throws

    When the Deepgram API returns a non-2xx status code. The error message includes the HTTP status and response body for debugging.

    Example

    const result = await provider.transcribe(
    { data: wavBuffer, mimeType: 'audio/wav', durationSeconds: 5.2 },
    { language: 'fr-FR', enableSpeakerDiarization: true },
    );

Properties

id: "deepgram-batch" = 'deepgram-batch'

Unique provider identifier used for registration and resolution.

displayName: "Deepgram (Batch)" = 'Deepgram (Batch)'

Human-readable display name for UI and logging.

supportsStreaming: false = false

This provider uses synchronous HTTP requests, not WebSocket streaming.