Input arguments object passed to the tool's run / execution entry point.
Must conform to the tool's declared inputSchema.
Expected output value used for correctness scoring. The judge uses this as a reference — partial matches may still score well.
A single test case used by the LLM judge to evaluate a newly forged tool.
The judge invokes the tool with
inputand compares the result againstexpectedOutputusing semantic equivalence (not strict equality).