Interface DocumentChunk

A single chunk produced by splitting a document. Used internally and returned in LoadedDocument.chunks.

interface DocumentChunk {
    content: string;
    index: number;
    pageNumber?: number;
    heading?: string;
    metadata?: Record<string, unknown>;
}

Properties

content: string

Text content of this chunk after extraction and cleaning.

index: number

Zero-based chunk index within the parent document.

pageNumber?: number

Page number this chunk originates from (1-based, PDF/DOCX).

heading?: string

Heading or section title that precedes this chunk, if detected.

metadata?: Record<string, unknown>

Chunk-level metadata (e.g. bounding box, column number for layout mode).