Interface VisionPipelineConfig

Configuration for the VisionPipeline.

All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.

Example

const config: VisionPipelineConfig = {
  strategy: 'progressive',
  ocr: 'paddle',
  handwriting: true,
  documentAI: true,
  embedding: true,
  cloudProvider: 'openai',
  cloudModel: 'gpt-4o',
  confidenceThreshold: 0.8,
  preprocessing: { grayscale: true, sharpen: true },
};

interface VisionPipelineConfig {
    strategy: VisionStrategy;
    ocr?: "paddle" | "tesseract" | "none";
    handwriting?: boolean;
    documentAI?: boolean;
    embedding?: boolean;
    cloudProvider?: string;
    cloudModel?: string;
    confidenceThreshold?: number;
    preprocessing?: VisionPreprocessingConfig;
}

Index

Properties

strategy ocr? handwriting? documentAI? embedding? cloudProvider? cloudModel? confidenceThreshold? preprocessing?

Properties

strategy

strategy: VisionStrategy

How to combine tiers.

Default

'progressive'

`Optional` ocr

ocr?: "paddle" | "tesseract" | "none"

OCR engine for Tier 1 text extraction.

'paddle' — PaddleOCR (via ppu-paddle-ocr). Best accuracy for most scripts.
'tesseract' — Tesseract.js. Wider language support, slightly lower accuracy.
'none' — Skip OCR entirely (useful for photo-only pipelines).

Default

'paddle' (if installed), else 'tesseract' (if installed), else 'none'

`Optional` handwriting

handwriting?: boolean

Enable handwriting recognition via TrOCR (@huggingface/transformers). Only triggered when OCR confidence is low and content appears handwritten.

Default

false

`Optional` documentAI

documentAI?: boolean

Enable document understanding via Florence-2 (@huggingface/transformers). Produces structured DocumentLayout with semantic block detection.

Default

false

`Optional` embedding

embedding?: boolean

Enable CLIP image embeddings (@huggingface/transformers). Runs in parallel with other tiers — does not affect text extraction.

Default

false

`Optional` cloudProvider

cloudProvider?: string

Cloud vision LLM provider name for Tier 3 fallback. Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google'). When unset, cloud vision is disabled.

`Optional` cloudModel

cloudModel?: string

Cloud model override. When unset, the provider's default vision model is used.

Example

'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'

`Optional` confidenceThreshold

confidenceThreshold?: number

Minimum confidence to accept an OCR result without escalating to cloud. Only applies to 'progressive' strategy — if OCR confidence is below this threshold, the pipeline escalates to the next tier.

Default

0.7

`Optional` preprocessing

preprocessing?: VisionPreprocessingConfig

Image preprocessing options applied before any tier runs. Uses sharp for resizing, grayscale conversion, sharpening, and normalization.

Interface VisionPipelineConfig

Example

Index

Properties

Properties

strategy

Default

`Optional` ocr

Default

`Optional` handwriting

Default

`Optional` documentAI

Default

`Optional` embedding

Default

`Optional` cloudProvider

`Optional` cloudModel

Example

`Optional` confidenceThreshold

Default

`Optional` preprocessing

Settings

Member Visibility

Theme

On This Page

Interface VisionPipelineConfig

Example

Index

Properties

Properties

strategy

Default

Optional ocr

Default

Optional handwriting

Default

Optional documentAI

Default

Optional embedding

Default

Optional cloudProvider

Optional cloudModel

Example

Optional confidenceThreshold

Default

Optional preprocessing

Settings

Member Visibility

Theme

On This Page

`Optional` ocr

`Optional` handwriting

`Optional` documentAI

`Optional` embedding

`Optional` cloudProvider

`Optional` cloudModel

`Optional` confidenceThreshold

`Optional` preprocessing