Interface CreationVerdict

Evaluation verdict produced by the LLM-as-judge after a tool is forged.

The judge runs the tool against its declared test cases and scores it across five evaluation dimensions. A tool is only registered when approved is true.

interface CreationVerdict {
    approved: boolean;
    confidence: number;
    safety: number;
    correctness: number;
    determinism: number;
    bounded: number;
    reasoning: string;
}

Index

Properties

approved confidence safety correctness determinism bounded reasoning

Properties

approved

approved: boolean

Whether the judge approves the tool for registration at its initial tier. false means the forge request is rejected and no tool is registered.

confidence

confidence: number

Overall confidence the judge has in its verdict, in the range [0, 1]. Low confidence may trigger a second judge pass or human review.

safety

safety: number

Safety score in the range [0, 1]. Assesses whether the tool's implementation could cause unintended harm, data exfiltration, or resource exhaustion.

correctness

correctness: number

Correctness score in the range [0, 1]. Measures how well the tool's outputs match the expected outputs in the declared test cases.

determinism

determinism: number

Determinism score in the range [0, 1]. Gauges whether repeated invocations with identical inputs produce consistent outputs. Lower scores flag non-deterministic behaviour.

bounded

bounded: number

Bounded execution score in the range [0, 1]. Indicates whether the tool reliably completes within its declared resource limits (memory, time). Scores derived from sandbox telemetry.

reasoning

reasoning: string

Free-text explanation of the verdict, including any failure reasons, flagged patterns, or suggestions for improvement.

Interface CreationVerdict

Index

Properties

Properties

approved

confidence

safety

correctness

determinism

bounded

reasoning

Settings

Member Visibility

Theme

On This Page