Skip to content

Cloud Battles — Operator Runbook

This page is the operator runbook for running the Cloud Battles surface. It assumes the deployment has the Phase O webhook outbox migration applied and a hosted Supabase instance with pg_cron and pg_net available.

For the full set of integrity checks the surface must pass before enabling cloud battles for external users, see Battle Integrity Checklist.

Preflight

Environment variables

VariablePurposeRequired
SUPABASE_URL / SUPABASE_ANON_KEYHosted Supabase project backing battles and workersyes
API_URLapps/platform-api origin used by the web app and workersyes
ANTHROPIC_API_KEY (edge function env)Used by the AI judge edge function.yes
CHAINABIT_API_URLUsed when battles dispatch through the Chainabit execution bridge.only if Chainabit bridge is enabled

Postgres GUCs

Set these on the deployment with ALTER DATABASE postgres SET …. Restart pg_cron workers after a change so the GUC takes effect inside the cron job's session.

GUCPurposeDefaultRequired
app.approval_timeout_hoursThreshold for the expire-stale-approvals job.24recommended
app.approval_webhook_urlBest-effort POST URL for newly-pending approvals (drives the operator pager).emptyyes
app.moderation_webhook_urlBest-effort POST URL for moderation events (flagged submissions, override decisions).emptyyes
app.webhook_signing_secretHMAC signing key for audit.webhook_outbox deliveries. Reject deliveries on the receiver side when X-Lenserfight-Signature does not match.emptyyes
sql
ALTER DATABASE postgres SET app.approval_timeout_hours = 24;
ALTER DATABASE postgres SET app.approval_webhook_url    = 'https://example.com/approvals';
ALTER DATABASE postgres SET app.moderation_webhook_url  = 'https://example.com/moderation';
ALTER DATABASE postgres SET app.webhook_signing_secret  = '<32-byte hex>';

Cron jobs

The dispatcher and timeout enforcement run inside Postgres via pg_cron. Verify both are scheduled:

sql
SELECT jobname, schedule, active
FROM cron.job
WHERE jobname IN ('expire-stale-approvals', 'webhook-outbox-dispatcher');

Both rows must show active = true.

Monitoring

SignalWhere to lookWhat to watch for
Webhook outbox backlogaudit.webhook_outboxcount(*) WHERE delivered_at IS NULL should drift toward zero. A growing undelivered count means receivers are 5xx-ing or the dispatcher is not running.
Approval timeout job healthcron.job_run_details WHERE jobname = 'expire-stale-approvals'Job runs every 5 minutes. Multiple consecutive failed rows indicates a configuration regression or a long-running locking transaction.
Webhook dispatcher healthcron.job_run_details WHERE jobname = 'webhook-outbox-dispatcher'Job runs frequently; consecutive failures means the dispatcher RPC is misconfigured or pg_net is unavailable.
Battle moderation overridesbattles.moderation_decisions (when present) or audit.action_logs WHERE action LIKE 'battle_moderation_%'Spike in overrides signals either a model regression or coordinated abuse. Cross-check against battles.battle_submissions rejection rate.
ELO change logbattles.elo_changes (or the equivalent log table)Every leaderboard mutation must produce a row. Gaps mean the ELO writer is bypassing the log path.

Rollback

A rollback is non-destructive — battles already in flight finish on whatever path they were claimed by. The flip just stops new entries.

bash
# 1. Stop serving cloud battle routes to the public (reverse-proxy or redeploy web + workers)
#    so new cloud battles cannot be created from untrusted clients.
sql
-- 2. Stop the webhook outbox dispatcher
SELECT cron.unschedule('webhook-outbox-dispatcher');

-- 3. Optional: clear the webhook URLs so failed retries don't fan out
ALTER DATABASE postgres SET app.approval_webhook_url   = '';
ALTER DATABASE postgres SET app.moderation_webhook_url = '';

Local battles (lf battle local) continue to work — they do not depend on any of the above.

To re-enable later, restore routing and configuration, re-set the GUCs, and re-schedule the dispatcher with the same expression used in the original migration.

Escalation

Use this channel when an automated control fails to contain a real-world incident (abusive submission, leaked credentials in a prompt, sustained moderation bypass).

  • Primary: email moderation@lenserfight.org. Expected first response within 24 hours.
  • GitHub Issue (sensitive): open a private security advisory if the report contains user data or credentials.
  • GitHub Issue (general): label the issue incident so it surfaces in the maintainer triage queue.

Integrity gate verification (Phase BV — 2026-05-12)

The five gates required by OSS Launch Scope are verified by automated tests at the SHA listed in Announcement Readiness.

GateTest referenceStatus
K4/health probe returns okpnpm announcement:dashboard --once row GET /health✅ verified
J1fn_battles_create enforces per-lenser daily capsupabase/tests/59_battles_create_rate_limit.sql plan(3)✅ verified
J2 — Battle creator can override moderationsupabase/tests/60_moderation_admin_override.sql plan(2)✅ verified
O1audit.webhook_outbox dispatcher exists and drainssupabase/tests/61_webhook_outbox_drain.sql plan(3); end-to-end smoke step 14 in scripts/smoke.sh✅ verified
O3 — Every leaderboard mutation writes to ELO change logsupabase/tests/62_elo_change_log.sql plan(2); table reputation.elo_battle_log✅ verified

All four pgTAP files are added to scripts/coverage-gate.sh critical-RPC checks. The gate fails the PR if any of these tests are removed or any of fn_battles_create, fn_decide_moderation_override, fn_dispatch_webhook_outbox, or fn_compute_elo_after_battle lose all test references.