Creates a new AssemblyAISTTProvider.
Provider configuration including the API key.
const provider = new AssemblyAISTTProvider({
apiKey: 'your-assemblyai-api-key',
});
Transcribes an audio buffer via the AssemblyAI three-step async pipeline: upload, submit, and poll.
Raw audio data and associated metadata. The data buffer
is uploaded to AssemblyAI's CDN in step 1.
Optional transcription settings. Pass
providerSpecificOptions.signal (an AbortSignal) to cancel
at any point in the pipeline.
A promise resolving to the normalized transcription result.
When the upload API returns a non-2xx status.
When the transcript submit API returns a non-2xx status.
When the polling API returns a non-2xx status.
When the transcript status becomes 'error' (includes
AssemblyAI's error message, e.g. "Audio file could not be decoded").
When the 120-second timeout is exceeded (includes the transcript ID for manual inspection via the AssemblyAI dashboard).
When the caller's AbortSignal is triggered.
const result = await provider.transcribe(
{ data: wavBuffer, mimeType: 'audio/wav' },
{ enableSpeakerDiarization: true, language: 'en' },
);
console.log(result.text);
console.log(result.segments?.map(s => `[${s.speaker}] ${s.text}`));
Readonly idUnique provider identifier used for registration and resolution.
Readonly displayHuman-readable display name for UI and logging.
Readonly supportsStreaming is not supported by this provider's async pipeline. AssemblyAI does offer a separate real-time streaming API via WebSocket, but that would be a different provider implementation.
Speech-to-text provider that uses the AssemblyAI async transcription API.
Three-Step Workflow
AssemblyAI uses an asynchronous transcription pipeline that requires three sequential HTTP requests:
Upload —
POST /v2/uploadsends the raw audio bytes to AssemblyAI's CDN and returns anupload_url. This step is necessary because the transcript endpoint accepts URLs, not raw audio.Submit —
POST /v2/transcriptcreates a transcription job referencing the upload URL. Returns a transcriptidused for polling. Optional features likespeaker_labelsare enabled in this request's JSON body.Poll —
GET /v2/transcript/:idis called everyPOLL_INTERVAL_MS(1 second) until the transcriptstatustransitions to'completed'or'error'. The polling loop is bounded byDEFAULT_TIMEOUT_MS(120 seconds) to prevent indefinite waiting.AbortController Usage
An optional
AbortSignalcan be passed viaoptions.providerSpecificOptions.signalto cancel the transcription at any point. The signal is forwarded to all three fetch calls and also checked at the top of each polling iteration. When aborted, an error is thrown immediately without waiting for the current fetch to complete.Error Handling
Errorwith the HTTP status and body.status === 'error'on the transcript throws with AssemblyAI's error message.See
AssemblyAISTTProviderConfig for configuration options See
AssemblyAITranscriptfor the polling response shape.Example