Creates a new SemanticChunker.
Optional config: SemanticChunkerConfigChunking configuration.
const chunker = new SemanticChunker({
targetSize: 800,
maxSize: 1500,
overlap: 80,
});
Splits text into semantically coherent chunks.
Pipeline:
preserveCodeBlocks)respectHeadings) — each heading starts a new sectionmaxSize, split by sentencesmaxSize, split at word boundaries (fixed fallback)minSize) with the previous chunkThe full text to chunk.
Optional metadata: Record<string, unknown>Optional metadata attached to all chunks.
Array of chunks in order.
If text is empty.
const chunks = chunker.chunk(
'# Introduction\n\nFirst paragraph.\n\n## Details\n\nSecond paragraph.',
{ source: 'docs/readme.md' },
);
// chunks[0].boundaryType === 'heading'
// chunks[0].text includes "# Introduction\n\nFirst paragraph."
Semantic text chunker that splits on natural boundaries instead of fixed character counts.
Produces chunks that are more semantically coherent than fixed-size splitting, improving retrieval quality by keeping related ideas together.
Example: Basic usage
Example: Preserving code blocks