Skip to content

Evaluation drawer

Opened from the Evaluations Section.

Fields

FieldRequiredNotes
NameyesUnique within the agent
DescriptionnoFree-text rationale
Model profileyesBinding the suite runs against — defaults to agent default
Casesyes (≥1)Managed via Evaluation Cases drawer
Schedule (cron)noWhen set, suite auto-runs on this cron expression

Lifecycle

  1. Create — empty suite, no cases.
  2. Add cases — open the cases drawer.
  3. Run — dispatches the suite, returns a run_id.
  4. Review — passes/fails per case; click a failure to open the Failed Case drawer.

When to schedule

  • Nightly regression on a critical workflow.
  • Pre-deploy gate triggered by webhook.
  • Weekly cohort audit across agents.

Code-backed workflow

Source of truth: EvaluationDrawer.tsx.

  1. Create or edit the evaluation suite and its target object.
  2. The drawer parses case JSON defensively before save.
  3. Verify the suite appears in Evaluations, then add cases and run it.