Class VisionPipeline

Unified vision pipeline with progressive enhancement.

Processes images through up to three tiers of increasing capability:

Local OCR (PaddleOCR / Tesseract.js) — fast, free, offline
Local Vision Models (TrOCR / Florence-2 / CLIP) — offline but slower
Cloud Vision LLMs (GPT-4o, Claude, Gemini) — best quality, API cost

All heavy dependencies are loaded lazily on first use. The pipeline never imports ML libraries at module load time, so it's safe to instantiate even when optional peer deps are missing — errors only surface when a tier that needs them actually runs.

See

createVisionPipeline for automatic provider detection.

Index

Constructors

constructor

Methods

process extractText embed analyzeLayout dispose

Constructors

constructor

new VisionPipeline(config): VisionPipeline
Create a new vision pipeline.
Parameters
- config: VisionPipelineConfig
  Pipeline configuration. All heavy dependencies are loaded lazily, so construction is synchronous and never imports ML libraries.
Returns VisionPipeline
Example
```
const pipeline = new VisionPipeline({
  strategy: 'progressive',
  ocr: 'paddle',
  handwriting: true,
  cloudProvider: 'openai',
});
```
- Defined in src/vision/VisionPipeline.ts:177

Methods

process

process(image, options?): Promise<VisionResult>
Process an image through the configured tiers.

Automatically detects content type (printed text, handwritten, diagram, etc.) and routes through the appropriate processing tiers based on the configured VisionStrategy.
Parameters
- image: string | Buffer<ArrayBufferLike>
  Image data as a Buffer or file-path / URL string. Buffers are preprocessed with sharp (if configured). URL strings are passed directly to providers that support them.
- Optional options: {
  forceCategory?: VisionContentCategory;
  tiers?: VisionTier[];
  }
  Optional overrides for this specific invocation.
  - Optional forceCategory?: VisionContentCategory
    Force a specific content category instead of auto-detecting from OCR confidence heuristics.
  - Optional tiers?: VisionTier[]
    Run only these specific tiers, ignoring the strategy's normal routing logic.
Returns Promise<VisionResult>
Aggregated vision result with text, confidence, embeddings, etc.
Throws
If all configured tiers fail to produce a result.

Throws
If a required dependency (e.g. ppu-paddle-ocr) is missing.

Throws
If dispose() was already called.

Example
```
// Full progressive pipeline
const result = await pipeline.process(imageBuffer);

// Force handwriting mode
const hw = await pipeline.process(scanBuffer, {
  forceCategory: 'handwritten',
});

// Only run OCR and embedding, skip everything else
const partial = await pipeline.process(imageBuffer, {
  tiers: ['ocr', 'embedding'],
});
```
- Defined in src/vision/VisionPipeline.ts:222

extractText

extractText(image): Promise<string>
Extract text only — fastest path using OCR tier exclusively.

Ignores all other tiers (handwriting, document-ai, cloud, embedding). Useful when you just need the text content and don't need confidence scoring, layout analysis, or embeddings.
Parameters
- image: string | Buffer<ArrayBufferLike>
  Image data as a Buffer or file-path / URL string.
Returns Promise<string>
Extracted text, or empty string if OCR produces no output.
Throws
If the configured OCR engine is missing.

Example
```
const text = await pipeline.extractText(receiptImage);
console.log(text); // "ACME STORE\n...\nTotal: $42.99"
```
- Defined in src/vision/VisionPipeline.ts:409

embed

embed(image): Promise<number[]>
Generate an image embedding using CLIP — embedding tier only.

Useful for building image similarity search indexes without running the full OCR + vision pipeline.
Parameters
- image: string | Buffer<ArrayBufferLike>
  Image data as a Buffer or file-path / URL string.
Returns Promise<number[]>
CLIP embedding vector (typically 512 or 768 dimensions).
Throws
If @huggingface/transformers is not installed.

Throws
If CLIP model loading fails.

Example
```
const embedding = await pipeline.embed(photoBuffer);
await vectorStore.upsert('images', [{
  id: 'photo-1',
  embedding,
  metadata: { source: 'upload' },
}]);
```
- Defined in src/vision/VisionPipeline.ts:442

analyzeLayout

analyzeLayout(image): Promise<DocumentLayout>
Analyze document layout using Florence-2 — document-ai tier only.

Returns structured DocumentLayout with semantic blocks (text, tables, figures, headings, lists, code) and their bounding boxes within each page.
Parameters
- image: string | Buffer<ArrayBufferLike>
  Image data as a Buffer or file-path / URL string.
Returns Promise<DocumentLayout>
Structured document layout with pages and blocks.
Throws
If @huggingface/transformers is not installed.

Throws
If Florence-2 model loading fails.

Example
```
const layout = await pipeline.analyzeLayout(documentScan);
for (const page of layout.pages) {
  for (const block of page.blocks) {
    console.log(`${block.type}: ${block.content.slice(0, 50)}...`);
  }
}
```
- Defined in src/vision/VisionPipeline.ts:479

dispose

dispose(): Promise<void>
Shut down the pipeline and release all loaded model resources.

After calling dispose(), any further calls to process(), extractText(), embed(), or analyzeLayout() will throw.

Returns Promise<void>
Example
```
const pipeline = new VisionPipeline({ strategy: 'progressive' });
try {
  const result = await pipeline.process(image);
} finally {
  await pipeline.dispose();
}
```
- Defined in src/vision/VisionPipeline.ts:506

Class VisionPipeline

See

Index

Constructors

Methods

Constructors

constructor

Parameters

Returns VisionPipeline

Example

Methods

process

Parameters

Optional forceCategory?: VisionContentCategory

Optional tiers?: VisionTier[]

Returns Promise<VisionResult>

Throws

Throws

Throws

Example

extractText

Parameters

Returns Promise<string>

Throws

Example

embed

Parameters

Returns Promise<number[]>

Throws

Throws

Example

analyzeLayout

Parameters

Returns Promise<DocumentLayout>

Throws

Throws

Example

dispose

Returns Promise<void>

Example

Settings

Member Visibility

Theme

On This Page

`Optional` forceCategory?: VisionContentCategory

`Optional` tiers?: VisionTier[]