Human-in-the-Loop, Approvals, and Versioning

Policy statement

Automation is the default. Humans intervene for judgment, compliance, access, and exceptions — never for routine tasks that agents can perform within guardrails.

Approval types

ID	Type	Required approver	Blocks
A1	Media plan approval	Planner	Execution
A2	Launch confirmation	Operator / Planner	First go-live
A3	Spend guardrail breach	Operator + optional client	Budget change
A4	Access / BM / DNS	Admin / client	Onboarding completion
A5	Plan revise / replan approval	Planner + client if budget up	New plan version `vN+1`
A6	Manual override	Admin	Reconciliation
A7	Compliance / policy	Planner + legal flag	Creative launch
A8	Agent loop exhausted (red flag)	Operator / admin	Same `run_id` auto-retry; downstream mutations on that task
A9	Cost Guard tripped (≥3× estimate)	Operator / admin	All LLM calls on `run_id`; downstream agent work

A8 is raised when QC correction, tool-call, orchestrator retry, or global per-run_id loop limits are hit without resolution — see Loop limits.

A9 is raised by the deterministic Cost Guard service (not an agent) when actual token spend on a run_id reaches 3× the pre-computed estimate — see Cost Guard.

Dashboard requirements (Phase 0 spec)

Every human touch must appear in the Human Touch Dashboard (operator surface — not System Ops) with:

Unique ticket ID
Tenant, vertical, platform
Requesting agent or service
Linked plan_version or change_set_id
Diff preview (structured JSON + human summary)
Approve / Reject / Request changes actions
Comment thread
SLA timer
Red flag (A8 / A9) items: distinct visual priority — A8 shows loop type / attempt count; A9 shows estimated vs actual USD summary — must not be buried in generic inbox; full token/QC drill-down → System Ops Dashboard

Versioning rules

Media plans

Monotonic integer version per tenant
Immutable approved payloads
Superseded plans remain queryable

Campaign state

execution_manifest links platform IDs → plan_version (live state; may drift until revise)
optimization_log / change_set_id append-only per applied change (input to plan revise)
Plan revise creates vN+1 type revise; replan creates type replan — both require A5 approval

Audit log

Append-only:

timestamp, actor_type, actor_id, action, resource, plan_version, payload_hash

Retention: minimum 7 years for financial audit (TBD with legal).

Escalation

Client-visible approvals

Optional client portal actions:

Approve budget increases
Approve creative themes
Reject proposed replans

Internal ops retain veto on compliance.

Configuration per tenant

approval_policy:
  first_launch_requires_human: true
  auto_approve_optimization_under_pct: 10
  client_must_approve_budget_increase: true
  vertical_compliance_profile: health_strict

loop_policy:  # hard ceilings — tenant may be stricter, not looser
  qc_correction_max: 2
  tool_rounds_max: 5
  orchestrator_retry_max: 1
  global_steps_per_run_id_max: 8
  on_exhaustion: red_flag_a8  # never infinite_retry

cost_guard_policy:
  trip_multiplier: 3.0
  block_on_trip: true

qc_threshold_policy:
  success_floor: 0.80
  window_hours: 24
  min_sample_global: 20

Metrics (ops)

Mean time to approve by type
% changes auto-approved
QC first-pass rate by main_task_id / agent / model (from agent_qc_results)
Top failure_codes and correction-loop depth (from agent_qc_loops)
A8 rate correlated with QC fail patterns (same run_id trace)
% task/subtask grains below 80% success floor (alert rate, time-to-recovery)
Rollback rate post-approval
Manual override count (target: decreasing over time)