Skip to content

Autonomous Agent Operating System

What changed in Phase 8

Phases 2 through 7 delivered isolated primitives: schedule dispatch, attribution scoring, evaluation pipelines, per-agent memory, tool registries, and approval gates. Each was independently useful but not coordinated — there was no shared concept of a run across all of them, and no platform-level ability to stop, inspect, or constrain agent execution at runtime.

Phase 8 does not add new primitives. It unifies the existing ones into a coherent operating system layer:

  • Every run produces a durable run report (immutable, cross-primitive).
  • Failures, budget overruns, and policy violations produce incidents attached to that report.
  • A policy engine evaluates pre-run conditions and emits a verdict before any agent step executes.
  • A unified read model (v_run_unified) joins all layers for operators and dashboards.

Architecture layers


Core primitives

PrimitiveDescriptionKey columns
agents.run_reportsImmutable post-run artifact created once a team run reaches a terminal stateid, team_run_id, ai_lenser_id, outcome, total_steps, total_tool_invocations, total_cost_estimate, started_at, ended_at, summary
agents.run_incidentsStructured problem records attached to a run reportid, run_report_id, incident_type, severity, message, context, occurred_at
agents.policy_evaluationsAppend-only audit log row per policy check, INSERT-onlyid, ai_lenser_id, team_run_id, tool_invocation_id, evaluation_point, policy_type, verdict, reason, context, evaluated_at
agents.v_run_unifiedRead model joining team runs, run reports, incidents, and policy evaluationsrun_id, lenser_handle, workflow_id, status, outcome, incident_count, policy_verdict, total_cost_estimate, started_at, ended_at

Policy engine evaluation priority

When fn_evaluate_pre_run_policy is called, it walks conditions in this order and stops at the first match:

  1. Kill switch active → verdict deny (reason: kill_switch_active)
  2. Lenser paused → verdict pause (reason: runner_paused)
  3. Budget ceiling exceeded — cumulative cost for the billing period exceeds workspace_settings.budget_ceiling_usd → verdict deny (reason: budget_ceiling_exceeded)
  4. Max parallel runs exceeded — active team run count ≥ workspace_settings.max_parallel_runs → verdict deny (reason: max_parallel_runs_exceeded)
  5. Dark launch inactive for this workflow — dark launch is enabled but md5(workflow_id) % 100 >= dark_launch_percentage → verdict deny (reason: dark_launch_excluded)
  6. Require approval — a pending approval exists for this agent or workflow → verdict require_approval
  7. All checks pass → verdict allow

Dark launch

Dark launch provides a deterministic percentage rollout for a workflow without modifying any workflow code.

The routing is computed as:

position = md5(workflow_id::text)::bit(32)::int % 100

If position < dark_launch_percentage, the run is included in the launch window and allowed to proceed (assuming no higher-priority denial). Otherwise the run is denied with dark_launch_excluded.

Why deterministic? A given workflow will always be inside or outside the launch window regardless of when the check runs. This makes rollout observable: you can predict exactly which workflow IDs are included at 10%, at 50%, and at 100% without any state mutation.


Human governance controls

ControlWhat it doesScope
Kill SwitchImmediately denies all new runs for this AI lenserPer-lenser
PauseHalts new run dispatch; active runs complete normallyPer-lenser
Budget CeilingDenies runs once cumulative cost exceeds the configured USD limitPer-workspace
Max Parallel RunsDenies new runs when active run count hits the limitPer-workspace
Dark LaunchRestricts a workflow to a deterministic percentage of runsPer-workflow

All of these are read from workspace_settings at evaluation time — no code deploy required to change them.


Run lifecycle