Function performOCR

performOCR(opts): Promise<OCRResult>

Extract text from an image using AgentOS's progressive vision pipeline.

This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.

When to use `performOCR()` vs `VisionPipeline`

Use case	Recommendation
One-shot text extraction	`performOCR()`
Batch processing many images	`VisionPipeline` (create once, reuse)
Need embeddings or layout	`VisionPipeline` (richer result)
Simple scripts / quick integration	`performOCR()`

Parameters

opts: PerformOCROptions
OCR options including the image source and strategy.

Returns Promise<OCRResult>

A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.

Example

// Basic usage — file path, auto-detect everything
const { text, confidence } = await performOCR({
  image: '/path/to/receipt.png',
});

// Privacy-sensitive — never call cloud APIs
const local = await performOCR({
  image: screenshotBuffer,
  strategy: 'local-only',
});

// Best quality — go straight to cloud
const cloud = await performOCR({
  image: 'https://example.com/document.jpg',
  strategy: 'cloud-only',
  provider: 'openai',
  model: 'gpt-4o',
});

Function performOCR

When to use `performOCR()` vs `VisionPipeline`

Parameters

Returns Promise<OCRResult>

Example

Settings

Member Visibility

Theme

On This Page

Function performOCR

When to use performOCR() vs VisionPipeline

Parameters

Returns Promise<OCRResult>

Example

Settings

Member Visibility

Theme

On This Page

When to use `performOCR()` vs `VisionPipeline`