Discussion about this post

User's avatar
Jason Stanley's avatar

Great piece, Jonathan. I like the premature completion diagnosis. And the output manifest with bash verification is an interesting response. I want to push on a few assumptions in the piece:

Workflows are not fixed deterministic pipelines. A workflow defines required steps with verification at each transition. Within those steps, judgment happens freely. What determines advancement from step 1 to step 2 can itself involve judgment: did the output meet a quality bar, does this case require escalation. That's sequenced accountability with gates, not rigidity. The runbook's end-of-run verification checks final state. It cannot enforce that step A completed before step B consumed its output, or that evaluation at step 3 occurred before step 4 ran. A workflow enforces ordering and gate-passing as structural properties of execution. A runbook encodes them as instructions the agent may or may not follow.

While it's true that the bash verification script is not advisory in the way prose instructions are (deterministic check that validates file existence and schema correctness produces a hard pass/fail the agent cannot rationalize away), the question is what acts on the measurement. If the agent reads the result and decides what to do, you have a hard measurement inside an advisory enforcement loop. If the harness blocks delivery on failure, that's genuine enforcement. The remaining advisory surface is ordering and intermediate evaluation, which stay as prose the agent interprets. So the architecture is a hybrid: hard verification at the boundary, advisory governance over trajectory.

That hybrid is well-suited to some kinds of work and less to others. Plenty of real institutional processes already use end-of-sequence verification: check completeness and quality at the end, don't prescribe intermediate ordering. But step-gated governance emerged as an institutional response to work with specific properties: known failure modes at transitions, volume that justified the overhead, auditability requirements demanding evidence of sequenced accountability. Organizations built procedural governance because certain work punished you for skipping steps in ways final-state checks couldn't catch. Even with identical agent capability, different work will demand different governance paradigms based on criticality, failure modes, and audit needs. Runbooks fit the first category well. Whether the form stretches to the second, or whether you end up reinventing orchestration, is the open question.

That raises the following question: what distinguishes a runbook from an orchestrated agent loop in LangGraph or Temporal? Built as an orchestrated graph, the same steps and evaluation criteria would get structural enforcement of ordering, separation of executor and evaluator, and deterministic gate logic. The runbook's advantage is portability: markdown works everywhere, no runtime dependency. The tradeoff is that trajectory governance stays advisory. The composable design is probably runbooks for judgment-heavy work where flexibility matters and final-state verification is sufficient, orchestration for work where enforcement has to be structural over the full execution path.

No posts

Ready for more?