Interface PerformOCROptions

Options accepted by performOCR.

interface PerformOCROptions {
    image: string | Buffer<ArrayBufferLike>;
    strategy?: "progressive" | "local-only" | "cloud-only";
    confidenceThreshold?: number;
    provider?: string;
    model?: string;
    apiKey?: string;
}

Properties

image: string | Buffer<ArrayBufferLike>

Image source. Accepts any of:

  • File path — absolute or relative filesystem path (e.g. /tmp/scan.png).
  • URL — HTTP(S) URL to fetch the image from.
  • Base64 string — raw base64-encoded image data (with or without a data:image/...;base64, prefix).
  • Buffer — in-memory image bytes.
strategy?: "progressive" | "local-only" | "cloud-only"

Vision strategy controlling which tiers are used.

  • 'progressive' — start local, escalate to cloud only when confidence is below confidenceThreshold. Best cost/quality balance.
  • 'local-only' — never call cloud APIs. For air-gapped / privacy use.
  • 'cloud-only' — skip local processing, send straight to a cloud LLM. Highest quality but highest cost.

Default

'progressive'
confidenceThreshold?: number

Minimum confidence threshold (0-1) to accept an OCR result from a local tier without escalating to the next tier.

Only meaningful for the 'progressive' strategy.

Default

0.7
provider?: string

Cloud LLM provider for tier-3 fallback (e.g. 'openai', 'anthropic', 'google'). When omitted the provider is auto-detected from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).

model?: string

Cloud LLM model override. When omitted the provider's default vision model is used (e.g. gpt-4o for OpenAI).

apiKey?: string

API key for the cloud provider. When omitted the key is read from the standard environment variable for the provider.