Creates a new BM25 index.
Optional config: BM25ConfigOptional BM25 tuning parameters.
// Use defaults (k1=1.2, b=0.75)
const index = new BM25Index();
// Custom parameters for short documents
const shortDocIndex = new BM25Index({ k1: 1.5, b: 0.5 });
Adds a single document to the BM25 index.
The text is tokenized, stop words are removed, and term frequencies are recorded in the inverted index. IDF values are lazily recomputed on the next search.
Unique document identifier.
Document text content to index.
Optional metadata: Record<string, unknown>Optional metadata to store.
If id is empty or text is empty.
index.addDocument('readme', 'AgentOS is a framework for building AI agents');
index.addDocument('changelog', 'v2.0: Added BM25 hybrid search', { version: '2.0' });
Adds multiple documents to the index in a single batch.
More efficient than calling addDocument repeatedly because IDF recomputation is deferred until the next search.
Array of documents to index.
index.addDocuments([
{ id: 'doc-1', text: 'First document content' },
{ id: 'doc-2', text: 'Second document content', metadata: { source: 'api' } },
]);
Searches the BM25 index for documents matching the query.
Scoring formula per document D and query Q:
score(D, Q) = sum_{t in Q} IDF(t) * (tf(t,D) * (k1 + 1)) / (tf(t,D) + k1 * (1 - b + b * |D| / avgdl))
Search query text.
Optional topK: number = 10Maximum number of results to return.
Array of results sorted by BM25 score descending.
const results = index.search('typescript error TS2304', 5);
for (const r of results) {
console.log(`${r.id}: score=${r.score.toFixed(4)}`);
}
Removes a document from the index by its ID.
Cleans up all term frequency entries in the inverted index and marks IDF for recomputation.
Document ID to remove.
true if the document existed and was removed, false otherwise.
const removed = index.removeDocument('doc-obsolete');
console.log(removed ? 'Removed' : 'Not found');
BM25 sparse keyword index for hybrid retrieval.
Dense embeddings excel at semantic similarity but miss exact keyword matches (e.g., error codes, function names, product IDs). BM25 catches these by scoring documents based on term frequency, inverse document frequency, and document length normalization.
Example: Basic usage
Example: Combined with HybridSearcher