Evaluation drawer
Opened from the Evaluations Section.
Fields
| Field | Required | Notes |
|---|---|---|
| Name | yes | Unique within the agent |
| Description | no | Free-text rationale |
| Model profile | yes | Binding the suite runs against — defaults to agent default |
| Cases | yes (≥1) | Managed via Evaluation Cases drawer |
| Schedule (cron) | no | When set, suite auto-runs on this cron expression |
Lifecycle
- Create — empty suite, no cases.
- Add cases — open the cases drawer.
- Run — dispatches the suite, returns a
run_id. - Review — passes/fails per case; click a failure to open the Failed Case drawer.
When to schedule
- Nightly regression on a critical workflow.
- Pre-deploy gate triggered by webhook.
- Weekly cohort audit across agents.
Code-backed workflow
Source of truth: EvaluationDrawer.tsx.
- Create or edit the evaluation suite and its target object.
- The drawer parses case JSON defensively before save.
- Verify the suite appears in Evaluations, then add cases and run it.