Creates a new BuiltInAdaptiveVadProvider.
Initializes both the environmental calibrator and the adaptive VAD engine with the provided or default configuration.
Optional VAD configuration. All fields default to standard values suitable for 16kHz mono voice audio.
// Default configuration (16kHz, 20ms frames)
const vad = new BuiltInAdaptiveVadProvider();
// Custom configuration
const vad = new BuiltInAdaptiveVadProvider({
sampleRate: 48_000,
frameDurationMs: 10,
vad: { minSpeechDurationMs: 200 },
});
Process a single audio frame and return a speech/non-speech decision.
This method must be called sequentially with consecutive audio frames. The VAD maintains internal state (speech onset tracking, hangover counters) that depends on temporal continuity between frames.
A Float32Array of audio samples for one frame. The expected
length is sampleRate * frameDurationMs / 1000 (e.g. 320 for 16kHz/20ms).
Samples should be normalized to the range [-1.0, 1.0].
A decision object with isSpeech, confidence, the raw VAD result,
and the current environmental noise profile.
const frame = new Float32Array(320);
// ... fill with audio samples ...
const decision = vad.processFrame(frame);
console.log(decision.isSpeech, decision.confidence);
Reset the VAD state for a new audio session.
Clears internal counters (speech onset tracking, hangover timers) so the VAD starts fresh. Should be called when starting a new conversation turn or after a significant audio gap. Does NOT reset the environmental calibrator — the noise profile persists across resets.
// Start a new conversation turn
vad.reset();
Returns the current environmental noise profile estimated by the calibrator.
The noise profile includes the estimated noise floor RMS, spectral shape,
and confidence metrics. Returns null if insufficient audio has been
processed for a reliable estimate.
The current noise profile, or null if not yet calibrated.
const profile = vad.getNoiseProfile();
if (profile) {
console.log(`Noise floor: ${profile.noiseFloorRms}`);
}
Readonly idUnique provider identifier used for registration and resolution.
Readonly displayHuman-readable display name for UI and logging.
Built-in voice activity detection (VAD) provider backed by the
AdaptiveVADengine andEnvironmentalCalibrator.This is the default VAD provider in AgentOS and requires no external dependencies or API keys. It operates entirely locally on raw audio frames.
How It Works
EnvironmentalCalibratorcontinuously estimates the ambient noise floor and spectral profile from incoming audio frames.AdaptiveVADuses the calibrator's noise profile to set dynamic thresholds for speech detection — louder environments get higher thresholds to avoid false positives.processFrame()call returns a SpeechVadDecision withisSpeech,confidence, the raw VAD result, and the current noise profile.Configuration Defaults
See
BuiltInAdaptiveVadProviderConfig for configuration options See
AdaptiveVADfor the underlying VAD algorithm. SeeEnvironmentalCalibratorfor the noise profiling engine.Example