Code Walk: workflow-execution.service.ts
This page walks through libs/infra/execution/src/lib/workflow-execution.service.ts top-to-bottom. Sections are ordered the way the file is laid out so you can read the file alongside this page. For sequence diagrams of the same logic, see Execution Engine — Internals.
1. Public types — lines 20–120
The first ~100 lines of the file are the public type surface. Everything downstream of the engine consumes these types.
NodeStatus(line 22) — every state a node can be in. Mirrors the Postgres CHECK constraint insupabase/migrations/20260420000000_workflow_status_alignment.sql:64.MergeStrategy(line 37),OnParentFailurePolicy(line 39),RetryCause(line 41) — narrow union types that show up on per-node config.RetryConfig(line 43) andModerationConfig(line 50) — per-node config slots, both optional.MemoryWritePolicy(line 67) —'on_success'(default, drop-on-failure) or'checkpoint'(per-node durability). The doc comment at lines 54–66 is the canonical explanation; copy it into any user-facing memory doc rather than restating.WorkflowNodeConfig(line 69),WorkflowNode(line 79),WorkflowEdge(line 89),EdgeCondition(line 100),NodeResult(line 106) — the engine's input shapes.
2. Engine context and event surface — lines 121–235
This section defines what the engine emits and what callers must supply.
WorkflowRunResult(line 121) — the value returned byexecuteWorkflow.EngineEventName(line 127) — every event the engine can publish. The status enum at line 22 and this list at 127 must stay in sync; a new node status that doesn't have an event is invisible to consumers.WaitingReason(line 148) — populated onawaiting_dependency,queued, andretryingresults so the run-state projection can explain why a node is paused.WorkflowExecutionContext(line 180) — the per-run input bag the caller hands to the engine. Themoderationslot at line 201 is optional and omitted by default.MemoryFlushSink(line 233) — the two-method hook for memory persistence. Implementations live inlibs/data/repositories/.
3. Defaults — lines 238–250
A short block of const defaults the engine falls back to when the per-node config does not pin a value:
DEFAULT_RETRY(line 240) — 3 attempts, 250 ms base backoff, retry ontimeout,provider_error,rate_limit. Notably does not includecontract_violated— contract violations are non-retryable on purpose.DEFAULT_TIMEOUT_MS = 60_000(line 247).DEFAULT_ON_PARENT_FAILURE = 'skip'(line 248).DEFAULT_MODERATION = 'off'(line 250) — moderation is opt-in.
4. executeWorkflow entry point — lines 267–290
WorkflowExecutionService (line 267) is a stateless façade. The class exists to give the engine a stable identity for DI; the work is in the methods, not in instance state.
executeWorkflow(input, ctx) (line 287) is the only public method. It takes the workflow definition (nodes + edges) plus a WorkflowExecutionContext and returns a WorkflowRunResult. Everything from the wave loop to the run-terminal hook is invoked from this method.
5. Wave loop — lines 380–795
The body of executeWorkflow is the wave loop. Read it as: build initial wave, run the wave concurrently, build next wave, repeat until empty.
- The initial wave is the set of nodes with in-degree zero (line 382).
while (wave.length > 0)at line 384 is the outer loop.- The concurrent dispatch is
Promise.all(wave.map(async (nodeId) => …))starting at line 393. - The next-wave build at line 779 only decrements in-degrees for edges whose
conditionevaluated truthy. Skipped edges keep dependents inawaiting_dependency.
6. Per-node lifecycle inside the wave — lines 405–795
For each node in a wave, the engine runs the same ordered pipeline. The section comments inside the file mark the boundaries:
- Parent-failure gate (line 405) — applies
OnParentFailurePolicy. Askipshort-circuits tostatus='skipped';propagatecauses the node to fail;substitute_defaultsubstitutes the contract's default value and continues. - Input contract validation (line 549) — runs
validateInputsfromcontract-validator. A failure short-circuits tostatus='failed',error='contract_violated'. - Input moderation (line 569) — only fires when
config.moderationis'input'or'both'. Flagged inputs short-circuit toerror='moderation_blocked'. - Provider call with retry + timeout (line 591) — the inner retry loop at line 596 wraps the provider call.
computeBackoffat line 839 sets the inter-attempt delay. - Output moderation (line 687) — symmetric to input moderation. Only fires when
config.moderationis'output'or'both'. - Output contract validation (line 713) — runs
validateOutput. A failure short-circuits witherror='contract_violated'. - Success (line 741) — persists the envelope, calls
onNodeCompletedif aMemoryFlushSinkis registered (line 760), and emits the success event.
7. Helpers — lines 797–end
Below line 797 is utility code that the wave loop calls but readers rarely need to dive into:
safeStringify/errorMessage(lines 799, 809) — defensive serialization for log lines.isAbortError,isTimeoutError,isRateLimitError(lines 813, 822, 828) — error-class detectors that map raw errors toRetryCause.computeBackoff(line 839) — exponential backoff with jitter, capped atmaxBackoffMs.createLinkedController(line 917) — wires per-node abort signals to the run-level controller so cancelling a run cancels in-flight provider calls.toEnvelope/envelopeToExecutionResult/envelopeToOutputData(lines 941, 977, 994) — provider response normalization.resolveRenderedInputs(line 1015),applyMerge(line 1072),isEdgeConditionSatisfied(line 1091) — the input merge pipeline.renderPrompt(line 1136) —[[label]]parameter substitution. Optional labels are detected byisLabelOptionalat line 1159.collectParentStatuses(line 1171) — used by the parent-failure gate.finish(line 1208) — the run-terminal helper that callsonRunCompletedand assembles theWorkflowRunResult.
When to skim this file vs. read it
Skim it when:
- You're touching memory hooks or run-terminal behavior — read sections 1, 2, 6, 7.
- You're adding a new
RetryCause— touch sections 1, 6.4 (line 624shouldRetry). - You're adding a new node
status— touch line 22 here, the migration atsupabase/migrations/20260420000000_workflow_status_alignment.sql:64, and theEngineEventNamelist at line 127.
Read it end-to-end when you are debugging an "impossible" wave-ordering bug — the answer is almost always in the parent-failure gate (section 6.1) or the next-wave build (section 5).
Related
- Execution Engine — Internals — sequence diagrams for the same logic.
- Workflow Engine Architecture — high-level model.
- Workflow status migration — the Postgres CHECK constraints that mirror
NodeStatus.