Interface RagRetrievalOptions

Options controlling retrieval behavior.

interface RagRetrievalOptions {
    topK?: number;
    targetDataSourceIds?: string[];
    targetMemoryCategories?: RagMemoryCategory[];
    metadataFilter?: MetadataFilter;
    strategy?: "hybrid" | "similarity" | "mmr";
    strategyParams?: {
        mmrLambda?: number;
        hybridAlpha?: number;
        custom?: Record<string, any>;
    };
    rerankerConfig?: {
        enabled?: boolean;
        modelId?: string;
        providerId?: string;
        topN?: number;
        maxDocuments?: number;
        timeoutMs?: number;
        params?: Record<string, any>;
    };
    includeEmbeddings?: boolean;
    queryEmbeddingModelId?: string;
    hyde?: {
        enabled?: boolean;
        initialThreshold?: number;
        minThreshold?: number;
        hypothesis?: string;
    };
    tokenBudgetForContext?: number;
    userId?: string;
    includeAudit?: boolean;
}

Properties

topK?: number

Maximum number of chunks per query.

targetDataSourceIds?: string[]

Set of explicit data sources to query.

targetMemoryCategories?: RagMemoryCategory[]

Memory categories to consult (maps to data sources via config).

metadataFilter?: MetadataFilter

Metadata filter applied at the vector-store layer.

strategy?: "hybrid" | "similarity" | "mmr"

Retrieval strategy (defaults to similarity search).

strategyParams?: {
    mmrLambda?: number;
    hybridAlpha?: number;
    custom?: Record<string, any>;
}

Strategy-specific parameters (MMR lambda, hybrid alpha, etc.).

Type declaration

  • Optional mmrLambda?: number
  • Optional hybridAlpha?: number
  • Optional custom?: Record<string, any>
rerankerConfig?: {
    enabled?: boolean;
    modelId?: string;
    providerId?: string;
    topN?: number;
    maxDocuments?: number;
    timeoutMs?: number;
    params?: Record<string, any>;
}

Cross-encoder reranking configuration.

When enabled, retrieved chunks are re-scored using a cross-encoder model for improved relevance ranking. Disabled by default due to added latency.

Recommended use cases:

  • Background analysis tasks (accuracy over speed)
  • Batch processing (no user waiting)
  • Knowledge-intensive tasks (reduces hallucination)

NOT recommended for real-time chat (latency sensitive).

Type declaration

  • Optional enabled?: boolean

    Enable cross-encoder reranking. Default: false

  • Optional modelId?: string

    Reranker model ID (e.g., 'rerank-v3.5', 'cross-encoder/ms-marco-MiniLM-L-6-v2')

  • Optional providerId?: string

    Provider ID ('cohere', 'local')

  • Optional topN?: number

    Number of top results to return after reranking

  • Optional maxDocuments?: number

    Max documents to send to reranker (limits cost/latency). Default: 100

  • Optional timeoutMs?: number

    Request timeout in ms. Default: 30000

  • Optional params?: Record<string, any>

    Provider-specific parameters

includeEmbeddings?: boolean

Include chunk embeddings in the response.

queryEmbeddingModelId?: string

Query embedding model override.

hyde?: {
    enabled?: boolean;
    initialThreshold?: number;
    minThreshold?: number;
    hypothesis?: string;
}

HyDE (Hypothetical Document Embedding) configuration. When enabled, generates a hypothetical answer before embedding for improved retrieval quality. Adds one LLM call per retrieval.

Type declaration

  • Optional enabled?: boolean

    Enable HyDE for this retrieval. Default: false.

  • Optional initialThreshold?: number

    Initial similarity threshold for adaptive thresholding. Default: 0.7.

  • Optional minThreshold?: number

    Minimum threshold to step down to. Default: 0.3.

  • Optional hypothesis?: string

    Pre-generated hypothesis (skip LLM call if provided).

tokenBudgetForContext?: number

Advisory token/character budget for final context construction.

userId?: string

Caller identity for logging/billing.

includeAudit?: boolean

When true, generates a RAGAuditTrail with per-operation transparency.