Class BM25Index

BM25 sparse keyword index for hybrid retrieval.

Dense embeddings excel at semantic similarity but miss exact keyword matches (e.g., error codes, function names, product IDs). BM25 catches these by scoring documents based on term frequency, inverse document frequency, and document length normalization.

Example: Basic usage

const index = new BM25Index({ k1: 1.5, b: 0.75 });

index.addDocuments([
  { id: 'doc-1', text: 'TypeScript compiler error TS2304' },
  { id: 'doc-2', text: 'JavaScript runtime TypeError explanation' },
  { id: 'doc-3', text: 'Fix error TS2304 by adding type declarations' },
]);

const results = index.search('error TS2304', 5);
// results[0].id === 'doc-3' (exact match on "error" + "TS2304")
// results[1].id === 'doc-1' (exact match on "error" + "TS2304")

Example: Combined with HybridSearcher

const hybrid = new HybridSearcher(vectorStore, embeddingManager, bm25Index, {
  denseWeight: 0.7,
  sparseWeight: 0.3,
});
const results = await hybrid.search('What does error TS2304 mean?');

Index

Constructors

constructor

Methods

addDocument addDocuments search removeDocument getStats

Constructors

constructor

new BM25Index(config?): BM25Index

Creates a new BM25 index.

Parameters

Optional config: BM25Config
Optional BM25 tuning parameters.

Returns BM25Index

Example

// Use defaults (k1=1.2, b=0.75)
const index = new BM25Index();

// Custom parameters for short documents
const shortDocIndex = new BM25Index({ k1: 1.5, b: 0.5 });

Methods

addDocument

addDocument(id, text, metadata?): void
Adds a single document to the BM25 index.

The text is tokenized, stop words are removed, and term frequencies are recorded in the inverted index. IDF values are lazily recomputed on the next search.
Parameters
- id: string
  Unique document identifier.
- text: string
  Document text content to index.
- Optional metadata: Record<string, unknown>
  Optional metadata to store.
Returns void
Throws
If id is empty or text is empty.

Example
```
index.addDocument('readme', 'AgentOS is a framework for building AI agents');
index.addDocument('changelog', 'v2.0: Added BM25 hybrid search', { version: '2.0' });
```
- Defined in src/rag/search/BM25Index.ts:298

addDocuments

addDocuments(docs): void
Adds multiple documents to the index in a single batch.

More efficient than calling addDocument repeatedly because IDF recomputation is deferred until the next search.
Parameters
- docs: {
      id: string;
      text: string;
      metadata?: Record<string, unknown>;
  }[]
  Array of documents to index.
Returns void
Example
```
index.addDocuments([
  { id: 'doc-1', text: 'First document content' },
  { id: 'doc-2', text: 'Second document content', metadata: { source: 'api' } },
]);
```
- Defined in src/rag/search/BM25Index.ts:352

search

search(query, topK?): BM25Result[]
Searches the BM25 index for documents matching the query.

Scoring formula per document D and query Q:
```
score(D, Q) = sum_{t in Q} IDF(t) * (tf(t,D) * (k1 + 1)) / (tf(t,D) + k1 * (1 - b + b * |D| / avgdl))
```
Parameters
- query: string
  Search query text.
- Optional topK: number = 10
  Maximum number of results to return.
Returns BM25Result[]
Array of results sorted by BM25 score descending.
Example
```
const results = index.search('typescript error TS2304', 5);
for (const r of results) {
  console.log(`${r.id}: score=${r.score.toFixed(4)}`);
}
```
- Defined in src/rag/search/BM25Index.ts:380

removeDocument

removeDocument(id): boolean
Removes a document from the index by its ID.

Cleans up all term frequency entries in the inverted index and marks IDF for recomputation.
Parameters
- id: string
  Document ID to remove.
Returns boolean
true if the document existed and was removed, false otherwise.
Example
```
const removed = index.removeDocument('doc-obsolete');
console.log(removed ? 'Removed' : 'Not found');
```
- Defined in src/rag/search/BM25Index.ts:436

getStats

getStats(): BM25Stats
Returns current index statistics.

Returns BM25Stats
Object containing document count, term count, and average document length.
Example
```
const stats = index.getStats();
console.log(`${stats.documentCount} docs, ${stats.termCount} unique terms`);
```
- Defined in src/rag/search/BM25Index.ts:465

Class BM25Index

Example: Basic usage

Example: Combined with HybridSearcher

Index

Constructors

Methods

Constructors

constructor

Parameters

Returns BM25Index

Example

Methods

addDocument

Parameters

Returns void

Throws

Example

addDocuments

Parameters

Returns void

Example

search

Parameters

Returns BM25Result[]

Example

removeDocument

Parameters

Returns boolean

Example

getStats

Returns BM25Stats

Example

Settings

Member Visibility

Theme

On This Page