Class SemanticChunker

Semantic text chunker that splits on natural boundaries instead of fixed character counts.

Produces chunks that are more semantically coherent than fixed-size splitting, improving retrieval quality by keeping related ideas together.

Example: Basic usage

const chunker = new SemanticChunker({ targetSize: 800, overlap: 50 });
const chunks = chunker.chunk(markdownDocument);
for (const c of chunks) {
  console.log(`Chunk ${c.index} (${c.boundaryType}): ${c.text.length} chars`);
}

Example: Preserving code blocks

const chunker = new SemanticChunker({
  targetSize: 1000,
  maxSize: 3000, // Allow larger chunks for code blocks
  preserveCodeBlocks: true,
});
const chunks = chunker.chunk(technicalDoc);

Index

Constructors

constructor

Methods

chunk

Constructors

constructor

new SemanticChunker(config?): SemanticChunker
Creates a new SemanticChunker.
Parameters
- Optional config: SemanticChunkerConfig
  Chunking configuration.
Returns SemanticChunker
Example
```
const chunker = new SemanticChunker({
  targetSize: 800,
  maxSize: 1500,
  overlap: 80,
});
```
- Defined in src/rag/chunking/SemanticChunker.ts:147

Methods

chunk

chunk(text, metadata?): SemanticChunk[]
Splits text into semantically coherent chunks.

Pipeline:
1. Pre-process: extract code blocks (if preserveCodeBlocks)
2. Split by headings (if respectHeadings) — each heading starts a new section
3. Within sections, split by paragraphs (double newline)
4. If a paragraph exceeds maxSize, split by sentences
5. If a sentence exceeds maxSize, split at word boundaries (fixed fallback)
6. Merge small fragments (< minSize) with the previous chunk
7. Add overlap from the end of the previous chunk to each chunk
Parameters
- text: string
  The full text to chunk.
- Optional metadata: Record<string, unknown>
  Optional metadata attached to all chunks.
Returns SemanticChunk[]
Array of chunks in order.
Throws
If text is empty.

Example
```
const chunks = chunker.chunk(
  '# Introduction\n\nFirst paragraph.\n\n## Details\n\nSecond paragraph.',
  { source: 'docs/readme.md' },
);
// chunks[0].boundaryType === 'heading'
// chunks[0].text includes "# Introduction\n\nFirst paragraph."
```
- Defined in src/rag/chunking/SemanticChunker.ts:185

Class SemanticChunker

Example: Basic usage

Example: Preserving code blocks

Index

Constructors

Methods

Constructors

constructor

Parameters

Returns SemanticChunker

Example

Methods

chunk

Parameters

Returns SemanticChunk[]

Throws

Example

Settings

Member Visibility

Theme

On This Page