Class TopicExtractor

Extracts a compact, deduplicated topic list from a set of corpus chunks.

Designed to feed into the QueryClassifier's system prompt so the LLM knows which documentation topics exist without receiving the full corpus.

Example

const extractor = new TopicExtractor();
const topics = extractor.extract(corpusChunks, { maxTopics: 30 });
const promptBlock = extractor.formatForPrompt(topics);
// "Authentication (docs/auth.md)\nDatabase (docs/database.md)\n..."

Constructors

Methods

Constructors

Methods

  • Extract a deduplicated, sorted, and capped topic list from corpus chunks.

    Deduplication key: heading::sourcePath. Two chunks with the same heading from the same source file are collapsed into a single entry.

    Parameters

    • chunks: CorpusChunk[]

      Corpus chunks to scan for topics.

    • Optional options: TopicExtractorOptions

      Optional extraction parameters.

    Returns TopicEntry[]

    Alphabetically sorted array of unique TopicEntry items, limited to maxTopics entries.

  • Format a topic list into a compact multi-line string suitable for injection into a classifier system prompt.

    Each line follows the pattern: TopicName (source/path.md)

    Parameters

    • topics: TopicEntry[]

      Array of topic entries to format.

    Returns string

    Newline-separated string with one topic per line.