Interface AdaptiveVADConfig

Configuration options for the AdaptiveVAD.

interface AdaptiveVADConfig {
    minSpeechDurationMs?: number;
    maxSilenceDurationMsInSpeech?: number;
    vadSensitivityFactor?: number;
    energySmoothingFrames?: number;
    thresholdRatio?: number;
}

Properties

minSpeechDurationMs?: number

Minimum duration in milliseconds that a sound segment must have to be considered speech. Helps filter out very short, non-speech noises.

Default

150
maxSilenceDurationMsInSpeech?: number

Maximum duration of silence in milliseconds within a speech segment before it's considered ended. e.g., a pause between words.

Default

500
vadSensitivityFactor?: number

Sensitivity adjustment factor, further fine-tunes thresholds from EnvironmentalCalibrator. Values > 1.0 make VAD less sensitive (require louder input for speech). Values < 1.0 make VAD more sensitive. This is applied ON TOP of the sensitivity factor in EnvironmentalCalibrator.

Default

1.0
energySmoothingFrames?: number

Number of past frames to consider for smoothing energy calculations (if smoothing is applied).

Default

5
thresholdRatio?: number

Ratio of speech_threshold / silence_threshold. Helps in creating a hysteresis effect. speech_threshold = silence_threshold * thresholdRatio

Default

1.5