Architecture · Draft
Human Control Plane
Purpose
Not every step should be fully autonomous. The human control plane is where Kobi operators and planners review, approve, reject, or override agent actions — with full audit and version linkage.
This document covers the Human Touch Dashboard only: business operations, approvals, and client-workflow surfaces. System health, logs, and engineering statistics live in a separate System Ops Dashboard behind IAP and/or VPN — not mixed into operator views.
Two surfaces (do not merge)
| Surface | Audience | Access | Contains |
|---|---|---|---|
| Human Touch Dashboard | Operators, planners, admins, auditors | SSO + business RBAC; WAF on public endpoints | Approvals, plan diffs, tenant timeline, tracking health signals, manual overrides |
| System Ops Dashboard | Engineering, SRE, system admins | IAP (required) + VPN (recommended for prod) | Logs, system status, QC/Cost Guard statistics, run telemetry, infra health |
Principle: operators see what to decide (diff, SLA, summary). System users see why it failed (token rows, failure-code heatmaps, playbook versions, infra errors). Deep links from Human Touch tickets into System Ops are allowed for users with both roles — never embed BQ/GCS explorers in the operator inbox.
Design goals
- No silent human work — If a human touches a platform manually, it must be logged or migrated into the system.
- Single approval queue — All pending human touches in one inbox, filterable by tenant, platform, urgency.
- Version binding — Every approval references a
plan_versionorchange_set_id. - Rollback — Approved changes that caused issues can be reverted to prior version where platforms allow.
- No engineering noise — Operators are not exposed to raw logs, model statistics, or infra panels.
Human touch categories
| Category | Examples | Default policy |
|---|---|---|
| Access & trust | BM partner invite, Kobi-entity business verify (PRE), client domain verify (guide), agency billing (PRE) | Kobi ops / legal for PRE; client DNS optional — client steps in onboarding client portal |
| Plan approval | New media plan, replan, budget reallocation > threshold | Always human |
| Launch | First campaign go-live per tenant | Human confirm (configurable auto after first) |
| Spend guardrail breach | Budget +20%, new geo | Human approve |
| Compliance | Health claims, school enrollment copy | Human + optional legal flag |
| Exception | API failure fallback manual fix | Human with post-hoc entry form |
| Red flag (A8) | Agent loop exhausted — QC/tool/retry/global cap hit | Always human; blocks auto-retry until resolved |
| Red flag (A9) | Cost Guard tripped — actual spend ≥3× estimate | Always human; all LLM calls on run_id blocked |
Human Touch Dashboard views
1. Approval inbox
- Pending items sorted by SLA; red flags (A8 / A9) pinned above standard approvals
- Fields: tenant, vertical, agent, requested action summary, diff preview, plan version
- Actions: Approve, Reject (with reason), Request changes
- A8 summary: loop type, attempt count, last QC failure reason (one line) — link to System Ops trace for engineers
- A9 summary: estimated vs actual USD, trip ratio — link to System Ops token breakdown
- A8 actions: Resolve (authorize new
run_id), Manual fix, Admin override (A6), Cancel task — no "retry in place" on samerun_id
2. Red flag queue
Operational view of open A8 / A9 tickets — not a statistics console.
- Open tickets across tenants; SLA and assignee
- Filters: loop type (QC / tool / timeout / global), cost trip, agent, platform, age
- Actions: resolve, escalate to engineering, cancel
- No BigQuery explorers, QC leaderboards, or Cost Guard ledgers here — those are System Ops
3. Tenant timeline
- Chronological audit: onboarding steps, plans, launches, optimizations, reports
- Filter by platform and actor (agent vs human)
- Business-readable events only; raw
run_idtraces open in System Ops
4. Plan diff viewer
Side-by-side comparison of plan versions:
- Channel budget split
- Target KPIs
- Campaign structure summary
- Tracking dependencies
5. Tracking health
- GA4 event volume, tag coverage, CAPI match rates (client-workflow signals)
- Blocks launch/optimization when red
- Detailed tag-debug and server-side logs → System Ops
6. Manual override form
When ops must act outside agents:
- Record platform, action, reason, ticket ID
- System schedules reconciliation job to sync state
Approval workflow
Roles (RBAC — Human Touch)
| Role | Permissions |
|---|---|
| Viewer | Read reports and timeline |
| Operator | Approve routine optimizations within policy; resolve A8/A9 with playbook |
| Planner | Approve/create media plans |
| Admin | Access grants, tenant config, guardrail edits |
| Auditor | Read-only audit export |
System Ops roles (system_developer, system_admin, sre) are defined in System Ops Dashboard — separate IAM group; may overlap with Admin for senior ops.
Versioning model
MediaPlan
id: uuid
tenant_id
version: integer (monotonic)
status: draft | pending_approval | approved | superseded
approved_by, approved_at
payload: channels, budgets, structures, rules
Campaign execution always stores plan_version on every platform mutation for traceability.
SLAs (planning defaults)
| Touch type | Target response |
|---|---|
| Launch approval | 4 business hours |
| Plan approval | 1 business day |
| Access / verification | 2 business days |
| Optimization over threshold | 4 business hours |
| Red flag (A8 / A9) | 4 business hours |
Related documents
- System Ops Dashboard — logs, statistics, system health (IAP / VPN)
- 05-human-in-the-loop.md — extended approval policy
- 07-security-access-governance.md — access tiers
- 04-lifecycle/plan-update.md