Creates a new registry pre-populated with the built-in loaders.
Loader registration order determines conflict resolution: later registrations override earlier ones for the same extension.
Registration order:
tesseract.js is installed..pdf and .docx, so it supersedes both
PdfLoader and DocxLoader when present.Register a loader for all extensions it declares.
If a previously registered loader already handles one of the extension, it is replaced. This makes it trivial to swap in a higher-fidelity implementation for any format.
The loader instance to register.
registry.register(new PdfLoader());
Retrieve the loader registered for extensionOrPath.
Both bare extensions (.md, md) and full file paths
(/docs/guide.md) are accepted.
File extension or full path.
The matching IDocumentLoader, or undefined when no
loader is registered for the detected extension.
const loader = registry.getLoader('.md');
const loader2 = registry.getLoader('README.md');
Convenience method: detect format from filePath, find the matching
loader, and delegate to its load() method.
Absolute (or resolvable relative) file path.
Optional options: LoadOptionsOptional load hints forwarded to the loader.
A promise resolving to the LoadedDocument.
When no loader is registered for the file's extension.
When the underlying loader's load() throws.
const doc = await registry.loadFile('/notes/architecture.md');
Central registry mapping file extensions to IDocumentLoader implementations.
Built-in loaders (registered automatically)
.txt,.csv,.tsv,.json,.yaml,.yml.md,.mdx.html,.htm.pdf.docxConditional loaders (registered when available)
tesseract.jsinstalledpython3 -m doclingavailableRegistering a custom loader
Using loadFile