Class PdfLoader

Document loader for PDF files.

Extraction tiers

unpdf — always used as the primary extraction engine. Performs pure-JS PDF text layer extraction with no native binaries required.
OCR fallback (optional) — supplied at construction time and engaged automatically when unpdf yields sparse text (< 50 chars per page on average), indicating a scanned document.
Docling fallback (optional) — when provided, takes precedence over both unpdf and OCR, yielding the highest-fidelity extraction at the cost of requiring a Python runtime.

Implements

Example

const ocrLoader    = createOcrPdfLoader();   // null if tesseract.js absent
const doclingLoader = createDoclingLoader(); // null if docling absent
const pdfLoader = new PdfLoader(ocrLoader, doclingLoader);
const doc = await pdfLoader.load('/reports/q3.pdf');

Implements

IDocumentLoader

Index

Constructors

constructor

new PdfLoader(ocrLoader?, doclingLoader?): PdfLoader
Creates a new PdfLoader.
Parameters
- ocrLoader: null | IDocumentLoader = null
  Optional OCR fallback (for example from createOcrPdfLoader()).
- doclingLoader: null | IDocumentLoader = null
  Optional Docling loader (for example from createDoclingLoader()).
Returns PdfLoader
- Defined in src/memory/io/ingestion/PdfLoader.ts:116

Methods

canLoad

canLoad(source): boolean
Returns true when this loader is capable of handling source.

For string sources the check is purely extension-based. For Buffer sources the loader may inspect magic bytes when relevant.
Parameters
- source: string | Buffer<ArrayBufferLike>
  Absolute file path or raw bytes.
Returns boolean
Implementation of IDocumentLoader.canLoad
- Defined in src/memory/io/ingestion/PdfLoader.ts:129

load

load(source, options?): Promise<LoadedDocument>
Parses source and returns a normalised LoadedDocument.

When source is a string the loader treats it as an absolute (or resolvable) file path and reads the file from disk. When source is a Buffer the loader parses the bytes directly and derives as much metadata as possible from the buffer content alone.
Parameters
- source: string | Buffer<ArrayBufferLike>
  Absolute file path OR raw document bytes.
- Optional options: LoadOptions
  Optional hints such as a format override.
Returns Promise<LoadedDocument>
A promise resolving to the fully-populated LoadedDocument.

Throws
When the file cannot be read or the format is not parsable.
Implementation of IDocumentLoader.load
- Defined in src/memory/io/ingestion/PdfLoader.ts:148

Properties

`Readonly` supportedExtensions

supportedExtensions: string[] = ...

File extensions this loader handles, each with a leading dot.

Used by LoaderRegistry to route file paths to the correct loader.

Example

['.md', '.mdx']

Class PdfLoader

Extraction tiers

Implements

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Parameters

Returns PdfLoader

Methods

canLoad

Parameters

Returns boolean

load

Parameters

Returns Promise<LoadedDocument>

Throws

Properties

`Readonly` supportedExtensions

Example

Settings

Member Visibility

Theme

On This Page

Class PdfLoader

Extraction tiers

Implements

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Parameters

Returns PdfLoader

Methods

canLoad

Parameters

Returns boolean

load

Parameters

Returns Promise<LoadedDocument>

Throws

Properties

Readonly supportedExtensions

Example

Settings

Member Visibility

Theme

On This Page

`Readonly` supportedExtensions