OCR options including the image source and strategy.
A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.
// Basic usage — file path, auto-detect everything
const { text, confidence } = await performOCR({
image: '/path/to/receipt.png',
});
// Privacy-sensitive — never call cloud APIs
const local = await performOCR({
image: screenshotBuffer,
strategy: 'local-only',
});
// Best quality — go straight to cloud
const cloud = await performOCR({
image: 'https://example.com/document.jpg',
strategy: 'cloud-only',
provider: 'openai',
model: 'gpt-4o',
});
Extract text from an image using AgentOS's progressive vision pipeline.
This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.
When to use
performOCR()vsVisionPipelineperformOCR()VisionPipeline(create once, reuse)VisionPipeline(richer result)performOCR()