Class HtmlLoader

Basic document loader for HTML (.html, .htm) files.

Text extraction strategy

<script> and <style> blocks are removed entirely.
Block-level elements (<p>, <div>, <h1>–<h6>, etc.) are replaced with newline characters to preserve paragraph structure.
All remaining HTML tags are stripped.
A common subset of HTML entities is decoded.
Excessive whitespace is collapsed.

Metadata

title — extracted from the <title> element when present.
wordCount — approximate count of words in the extracted text.
source — absolute file path (when loaded from disk).

Implements

Example

const loader = new HtmlLoader();
const doc = await loader.load('/public/index.html');
console.log(doc.metadata.title); // e.g. 'Welcome to AgentOS'

Implements

IDocumentLoader

Index

Constructors

constructor

new HtmlLoader(): HtmlLoader
Returns HtmlLoader

Methods

canLoad

canLoad(source): boolean
Returns true when this loader is capable of handling source.

For string sources the check is purely extension-based. For Buffer sources the loader may inspect magic bytes when relevant.
Parameters
- source: string | Buffer<ArrayBufferLike>
  Absolute file path or raw bytes.
Returns boolean
Implementation of IDocumentLoader.canLoad
- Defined in src/memory/io/ingestion/HtmlLoader.ts:184

load

load(source, _options?): Promise<LoadedDocument>
Parses source and returns a normalised LoadedDocument.

When source is a string the loader treats it as an absolute (or resolvable) file path and reads the file from disk. When source is a Buffer the loader parses the bytes directly and derives as much metadata as possible from the buffer content alone.
Parameters
- source: string | Buffer<ArrayBufferLike>
  Absolute file path OR raw document bytes.
- Optional _options: LoadOptions
  Optional hints such as a format override.
Returns Promise<LoadedDocument>
A promise resolving to the fully-populated LoadedDocument.

Throws
When the file cannot be read or the format is not parsable.
Implementation of IDocumentLoader.load
- Defined in src/memory/io/ingestion/HtmlLoader.ts:196

Properties

`Readonly` supportedExtensions

supportedExtensions: string[] = ...

File extensions this loader handles, each with a leading dot.

Used by LoaderRegistry to route file paths to the correct loader.

Example

['.md', '.mdx']

Class HtmlLoader

Text extraction strategy

Metadata

Implements

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Returns HtmlLoader

Methods

canLoad

Parameters

Returns boolean

load

Parameters

Returns Promise<LoadedDocument>

Throws

Properties

`Readonly` supportedExtensions

Example

Settings

Member Visibility

Theme

On This Page

Class HtmlLoader

Text extraction strategy

Metadata

Implements

Example

Implements

Index

Constructors

Methods

Properties

Constructors

constructor

Returns HtmlLoader

Methods

canLoad

Parameters

Returns boolean

load

Parameters

Returns Promise<LoadedDocument>

Throws

Properties

Readonly supportedExtensions

Example

Settings

Member Visibility

Theme

On This Page

`Readonly` supportedExtensions