Function performOCR

  • Extract text from an image using AgentOS's progressive vision pipeline.

    This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.

    When to use performOCR() vs VisionPipeline

    Use case Recommendation
    One-shot text extraction performOCR()
    Batch processing many images VisionPipeline (create once, reuse)
    Need embeddings or layout VisionPipeline (richer result)
    Simple scripts / quick integration performOCR()

    Parameters

    Returns Promise<OCRResult>

    A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.

    Example

    // Basic usage — file path, auto-detect everything
    const { text, confidence } = await performOCR({
    image: '/path/to/receipt.png',
    });

    // Privacy-sensitive — never call cloud APIs
    const local = await performOCR({
    image: screenshotBuffer,
    strategy: 'local-only',
    });

    // Best quality — go straight to cloud
    const cloud = await performOCR({
    image: 'https://example.com/document.jpg',
    strategy: 'cloud-only',
    provider: 'openai',
    model: 'gpt-4o',
    });