Failed Case drawer
Opened from the Evaluations Section or Evaluation drawer when a case fails.
Sections
| Section | Content |
|---|---|
| Expected | The assertion's expected value (rendered by type) |
| Actual | The agent's actual output for this case |
| Diff | Inline character-level diff for substring / regex types |
| Run trace | Link to the originating run (opens Run Detail) |
| Token cost | Prompt + completion tokens consumed |
Triage flow
- Read the diff — is the actual output close, or wildly off?
- Open the run trace to inspect tool calls and intermediate states.
- Decide:
- Update the case (assertion was wrong).
- Update the instruction lens (prompt regression).
- Update the model profile (model regression).
- Update tooling (tool regression).
Code-backed workflow
Source of truth: FailedCaseDrawer.tsx.
- Inspect failed evaluation output, expected result, score, and diff context.
- Use it for diagnosis only; fixes belong in cases, rubrics, prompts, models, or workflows.
- Verify the fix by rerunning the suite and comparing against the baseline.