How to combine tiers.
'progressive'
Optional ocrOCR engine for Tier 1 text extraction.
'paddle' — PaddleOCR (via ppu-paddle-ocr). Best accuracy for most scripts.'tesseract' — Tesseract.js. Wider language support, slightly lower accuracy.'none' — Skip OCR entirely (useful for photo-only pipelines).'paddle' (if installed), else 'tesseract' (if installed), else 'none'
Optional handwritingEnable handwriting recognition via TrOCR (@huggingface/transformers).
Only triggered when OCR confidence is low and content appears handwritten.
false
Optional documentAIEnable document understanding via Florence-2 (@huggingface/transformers).
Produces structured DocumentLayout with semantic block detection.
false
Optional embeddingEnable CLIP image embeddings (@huggingface/transformers).
Runs in parallel with other tiers — does not affect text extraction.
false
Optional cloudCloud vision LLM provider name for Tier 3 fallback.
Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google').
When unset, cloud vision is disabled.
Optional cloudCloud model override. When unset, the provider's default vision model is used.
'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'
Optional confidenceMinimum confidence to accept an OCR result without escalating to cloud.
Only applies to 'progressive' strategy — if OCR confidence is below
this threshold, the pipeline escalates to the next tier.
0.7
Optional preprocessingImage preprocessing options applied before any tier runs.
Uses sharp for resizing, grayscale conversion, sharpening,
and normalization.
Configuration for the VisionPipeline.
All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.
Example