Interface IVisionProvider

Minimal interface for a vision LLM that can describe images.

This is kept intentionally narrow to avoid coupling the multimodal indexer to a specific LLM provider. Any service that can take an image and return a text description satisfies this contract.

Example

const visionProvider: IVisionProvider = {
describeImage: async (image) => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: [
{ type: 'text', text: 'Describe this image in detail.' },
{ type: 'image_url', image_url: { url: imageUrl } },
]}],
});
return response.choices[0].message.content!;
},
};
interface IVisionProvider {
    describeImage(image): Promise<string>;
}

Implemented by

Methods

  • Generate a text description of the provided image.

    Parameters

    • image: string

      Image as a URL string or base64 data URL.

    Returns Promise<string>

    A detailed text description of the image content.