Interface VisionPipelineConfig

Configuration for the VisionPipeline.

All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.

Example

const config: VisionPipelineConfig = {
strategy: 'progressive',
ocr: 'paddle',
handwriting: true,
documentAI: true,
embedding: true,
cloudProvider: 'openai',
cloudModel: 'gpt-4o',
confidenceThreshold: 0.8,
preprocessing: { grayscale: true, sharpen: true },
};
interface VisionPipelineConfig {
    strategy: VisionStrategy;
    ocr?: "paddle" | "tesseract" | "none";
    handwriting?: boolean;
    documentAI?: boolean;
    embedding?: boolean;
    cloudProvider?: string;
    cloudModel?: string;
    confidenceThreshold?: number;
    preprocessing?: VisionPreprocessingConfig;
}

Properties

strategy: VisionStrategy

How to combine tiers.

Default

'progressive'
ocr?: "paddle" | "tesseract" | "none"

OCR engine for Tier 1 text extraction.

  • 'paddle' — PaddleOCR (via ppu-paddle-ocr). Best accuracy for most scripts.
  • 'tesseract' — Tesseract.js. Wider language support, slightly lower accuracy.
  • 'none' — Skip OCR entirely (useful for photo-only pipelines).

Default

'paddle' (if installed), else 'tesseract' (if installed), else 'none'
handwriting?: boolean

Enable handwriting recognition via TrOCR (@huggingface/transformers). Only triggered when OCR confidence is low and content appears handwritten.

Default

false
documentAI?: boolean

Enable document understanding via Florence-2 (@huggingface/transformers). Produces structured DocumentLayout with semantic block detection.

Default

false
embedding?: boolean

Enable CLIP image embeddings (@huggingface/transformers). Runs in parallel with other tiers — does not affect text extraction.

Default

false
cloudProvider?: string

Cloud vision LLM provider name for Tier 3 fallback. Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google'). When unset, cloud vision is disabled.

cloudModel?: string

Cloud model override. When unset, the provider's default vision model is used.

Example

'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'
confidenceThreshold?: number

Minimum confidence to accept an OCR result without escalating to cloud. Only applies to 'progressive' strategy — if OCR confidence is below this threshold, the pipeline escalates to the next tier.

Default

0.7

Image preprocessing options applied before any tier runs. Uses sharp for resizing, grayscale conversion, sharpening, and normalization.