Create a new pipeline vision provider.
An initialized VisionPipeline instance.
The caller retains ownership and is responsible for calling
pipeline.dispose() when done.
If pipeline is null or undefined.
const pipeline = new VisionPipeline({ strategy: 'progressive' });
const provider = new PipelineVisionProvider(pipeline);
Generate a text description of the provided image by running it through the full vision pipeline.
This satisfies the IVisionProvider contract. The image passes through all configured tiers (OCR, handwriting, document-ai, cloud) and the best extracted text is returned.
Image as a URL string (https://... or data:image/...).
Text description or extracted content from the image.
If all pipeline tiers fail to produce output.
If the pipeline has been disposed.
const description = await provider.describeImage(imageUrl);
console.log(description);
Process an image through the full pipeline and return the complete VisionResult — including embeddings, layout, confidence scores, and per-tier breakdowns.
Use this when you need more than just the text description (e.g. to store the CLIP embedding alongside the text embedding in the vector store).
Image data as a Buffer or URL string.
Full vision pipeline result.
If all pipeline tiers fail.
If the pipeline has been disposed.
const result = await provider.processWithFullResult(imageBuffer);
// Use both text embedding (via indexer) and image embedding (via CLIP)
if (result.embedding) {
await imageVectorStore.upsert('images', [{
id: docId,
embedding: result.embedding,
metadata: { text: result.text },
}]);
}
Get a reference to the underlying pipeline for direct access.
Useful when the caller needs to invoke pipeline-specific methods
like extractText(), embed(), or analyzeLayout() that aren't
exposed through the IVisionProvider interface.
The underlying VisionPipeline instance.
const layout = await provider.getPipeline().analyzeLayout(image);
Adapts the full VisionPipeline to the narrow IVisionProvider interface used by the multimodal indexer.
The pipeline's
process()method runs all configured tiers and returns a rich VisionResult. This adapter extracts just the text field that the indexer needs for embedding generation.For callers that need the full pipeline result (embeddings, layout, confidence, regions), use
processWithFullResult()instead.Example