Minimal interface for a vision LLM that can describe images.
This is kept intentionally narrow to avoid coupling the multimodal
indexer to a specific LLM provider. Any service that can take an
image and return a text description satisfies this contract.
Minimal interface for a vision LLM that can describe images.
This is kept intentionally narrow to avoid coupling the multimodal indexer to a specific LLM provider. Any service that can take an image and return a text description satisfies this contract.
Example