Interface VisionResult

Aggregated result from the vision pipeline after all configured tiers have run. Contains the best extracted text, content classification, optional embeddings, and detailed per-tier breakdowns.

Example

const result = await pipeline.process(imageBuffer);

// Best extracted text across all tiers
console.log(result.text);

// What kind of content was detected
console.log(result.category); // 'printed-text' | 'handwritten' | ...

// CLIP embedding for similarity search
if (result.embedding) {
await vectorStore.upsert('images', [{ embedding: result.embedding }]);
}

// Inspect individual tier contributions
for (const tr of result.tierResults) {
console.log(`${tr.tier} (${tr.provider}): ${tr.confidence}`);
}
interface VisionResult {
    text: string;
    confidence: number;
    category: VisionContentCategory;
    tiers: VisionTier[];
    tierResults: VisionTierResult[];
    embedding?: number[];
    layout?: DocumentLayout;
    regions?: VisionTextRegion[];
    durationMs: number;
}

Properties

text: string

Best extracted text (from OCR, handwriting, or vision description).

confidence: number

Overall confidence score 0–1, taken from the winning tier.

What kind of content was detected.

tiers: VisionTier[]

Which tier(s) contributed to the final result.

tierResults: VisionTierResult[]

Detailed results from each tier that ran, ordered by execution.

embedding?: number[]

CLIP image embedding vector, when embedding tier is enabled.

Structured document layout, when Florence-2 ran.

regions?: VisionTextRegion[]

Bounding boxes for detected text regions from the winning tier.

durationMs: number

Total wall-clock processing time in milliseconds.