Type alias MediaStreamIncoming

MediaStreamIncoming: {
    type: "audio";
    payload: Buffer;
    streamSid: string;
    sequenceNumber?: number;
} | {
    type: "dtmf";
    digit: string;
    streamSid: string;
    durationMs?: number;
} | {
    type: "start";
    streamSid: string;
    callSid: string;
    metadata?: Record<string, unknown>;
} | {
    type: "stop";
    streamSid: string;
} | {
    type: "mark";
    name: string;
    streamSid: string;
}

Discriminated union of all normalised events that can arrive on a media stream WebSocket connection, regardless of the underlying telephony provider.

Variant summary

type When it fires Key payload fields
audio Each inbound audio chunk (~20ms intervals) payload (mu-law Buffer)
dtmf Caller presses a phone keypad button digit, durationMs?
start Stream session begins (metadata available) callSid, metadata?
stop Stream session ends / call disconnects (none beyond streamSid)
mark Named sync point injected into audio stream name

All variants carry a streamSid field to identify which stream the event belongs to (important when a single server handles multiple concurrent calls).

Type declaration

  • type: "audio"

    Inbound audio chunk encoded as mu-law 8-bit 8 kHz PCM.

    Audio arrives as small chunks (typically 20ms / 160 bytes) at regular intervals for the duration of the call. The pipeline must decode mu-law -> PCM Int16 -> resample -> Float32 before feeding to STT/VAD.

  • payload: Buffer

    Raw mu-law bytes decoded from whatever encoding the provider uses.

  • streamSid: string

    Provider-assigned stream identifier.

  • Optional sequenceNumber?: number

    Monotonically increasing sequence number, when provided.

Type declaration

  • type: "dtmf"

    DTMF tone detected by the provider during the call.

    Not all providers relay DTMF over the media stream -- Telnyx, for example, only delivers DTMF via HTTP webhooks. Check the provider's parser documentation for availability.

  • digit: string

    Single character digit pressed by the caller (0-9, *, #, A-D).

  • streamSid: string

    Provider-assigned stream identifier.

  • Optional durationMs?: number

    Duration the key was held, in milliseconds, when reported.

Type declaration

  • type: "start"

    Stream successfully started; metadata about the call is available.

    This is always the first meaningful event on a new stream connection. The TelephonyStreamTransport transitions from connecting to open upon receiving this event and sends the optional MediaStreamParser.formatConnected acknowledgment.

  • streamSid: string

    Provider-assigned stream identifier.

  • callSid: string

    Provider call-leg identifier (e.g. Twilio CallSid, Telnyx call_control_id).

  • Optional metadata?: Record<string, unknown>

    Additional provider-specific metadata attached to the start event.

Type declaration

  • type: "stop"

    Call ended or stream was explicitly stopped.

    The TelephonyStreamTransport transitions to closed and emits a 'close' event upon receiving this.

  • streamSid: string

    Provider-assigned stream identifier.

Type declaration

  • type: "mark"

    Named marker injected into the audio stream for synchronisation.

    Marks are used to correlate outbound audio playback completion with application logic (e.g., knowing when a TTS utterance finished playing so the agent can transition from speaking to listening).

  • name: string

    The label assigned to this mark point.

  • streamSid: string

    Provider-assigned stream identifier.