Optional config: MultimodalConfigOptional configuration. Omit to use in passthrough mode.
Enrich images with captions via the configured vision LLM.
Only images that have no existing caption field are processed. Images
that already carry a caption are left unchanged to avoid redundant LLM
calls.
When no describeImage function is configured all images are returned
unchanged.
Array of ExtractedImage objects to process.
A promise resolving to the same-length array of ExtractedImage objects, with captions filled in where possible.
Adds auto-generated captions to ExtractedImage objects that lack one, using a caller-supplied vision LLM function.
Images are processed in parallel via
Promise.allSettled()so a single failed captioning attempt does not block the rest. Images whose captioning fails retain their original (un-captioned) state rather than propagating the error.Example — with a vision LLM
Example — passthrough (no LLM configured)