Skip to content

Evaluation Cases drawer

Opened from the Evaluation drawer.

What's a case?

A case is one input + one assertion. The suite passes overall when every case passes.

Fields per case

FieldNotes
InputJSON payload sent to the agent
ExpectedThe assertion — see types below
Typesubstring / regex / jsonpath / score_gte
WeightMultiplier on the pass/fail score (default 1)

Assertion types

TypeExpected value example
substring"approved"
regex^OK \\d{3}$
jsonpath$.status == "ok"
score_gte0.8 (judge model returns score ≥ 0.8)

Bulk import

Paste a JSON array of cases into the Import field — the drawer validates each row and reports the first failure with line number.

Code-backed workflow

Source of truth: EvaluationCasesDrawer.tsx.

  1. List, create, and delete evaluation cases for one suite.
  2. Cases include input, expected output or assertion data, and scoring metadata.
  3. Verify cases exist before running the evaluation suite.