Best extracted text (from OCR, handwriting, or vision description).
Overall confidence score 0–1, taken from the winning tier.
What kind of content was detected.
Which tier(s) contributed to the final result.
Detailed results from each tier that ran, ordered by execution.
Optional embeddingCLIP image embedding vector, when embedding tier is enabled.
Optional layoutStructured document layout, when Florence-2 ran.
Optional regionsBounding boxes for detected text regions from the winning tier.
Total wall-clock processing time in milliseconds.
Aggregated result from the vision pipeline after all configured tiers have run. Contains the best extracted text, content classification, optional embeddings, and detailed per-tier breakdowns.
Example