Operations & Data · Draft
Security, Access, and Governance
Objectives
- Protect client and platform credentials
- Enforce least-privilege access for humans and automation
- Maintain auditability across tenants
- Isolate multi-tenant data
Credential management
| Secret type | Storage | Rotation |
|---|---|---|
| Google Ads / DV360 OAuth | Secret Manager | 90 days |
| Meta system user token | Secret Manager | Per Meta policy |
| TikTok access token | Secret Manager | On expiry |
| CRM webhook signing key | Secret Manager | 180 days |
| GA4 Measurement Protocol secret | Secret Manager | On compromise |
Never commit secrets to git. .gitignore blocks common patterns.
Access model
Clients
- Access Kobi dashboard only
- No direct ad platform admin
- Optional GA4 viewer on their property
Kobi operators (Human Touch Dashboard)
- RBAC: Viewer, Operator, Planner, Admin, Auditor
- SSO (future) with MFA required for Admin
- Approvals, plan diffs, tenant timeline — no raw logs or engineering statistics
Kobi system users (System Ops Dashboard)
- RBAC:
system_viewer,system_developer,sre,system_admin— see System Ops Dashboard - IAP required on all routes; VPN recommended for production
- Logs, QC/Cost Guard statistics, infra health — disjoint IAM from operator roles by default
Automation
- Dedicated service accounts per environment
- Scoped IAM: only required API permissions
- No shared tokens across tenants
Platform access patterns
| Platform | Human access | Automation access |
|---|---|---|
| Google Ads | Break-glass MCC admin | API service user |
| Meta | BM admin (ops only) | System user |
| TikTok | BC admin (ops only) | Marketing API app |
| DV360 | Partner admin | API service account |
| GA4 | Editor for automation account | Measurement Protocol |
Scope matrices, rate-limit tiers, app review requirements, and billing activation: Platform access & API readiness.
Multi-tenant isolation
tenant_idon every API request and database row- Separate encryption keys per tenant for sensitive config (optional premium)
- Cross-tenant queries forbidden at application layer
- BigQuery row-level security by
tenant_id
Audit
- Append-only audit log for approvals, executions, credential use
- Cloud Logging retention aligned with compliance (min 1 year operational, 7 years financial TBD)
- Export for external audit on request
Network security
- Private connectivity to CRM where possible
- Egress allowlist for platform APIs
- WAF on Human Touch dashboard endpoints (implementation phase)
- System Ops Dashboard — not public; IAP-only (+ VPN for prod); separate Cloud Run service from Human Touch BFF
Incident response
| Severity | Example | Response |
|---|---|---|
| P1 | Token leak | Rotate all tenant tokens; pause automation |
| P2 | Wrong tenant campaign mutation | Rollback manifest; notify client |
| P3 | Tracking outage | Pause optimization increases |
Reliability, backup & disaster recovery
Planning-level targets — confirm with the implementation team and the parent platform's SRE standards before GA. Phase 1 pilots can run looser; these are the GA targets.
Service-level objectives (internal SLO; client SLA set commercially by parent/VC)
| Surface | Availability SLO | Notes |
|---|---|---|
| Human Touch dashboard (approvals) | 99.5% | Operators must be able to approve/abort; degrade read-only before full outage |
| Orchestrator / connectors (automation) | 99.0% | Async + retried; brief outages self-heal via Pub/Sub redelivery |
| System Ops dashboard | best-effort | Internal tool; not client-facing |
Agent runs are async and idempotent (campaign-execution) — a control-plane outage delays work, it does not corrupt campaigns. No external client uptime SLA is promised by this module unless the parent/VC contracts one.
RTO / RPO targets by data class
| Data class | Store | RPO (max data loss) | RTO (max downtime) | Backup mechanism |
|---|---|---|---|---|
| Tenant registry / config | Cloud SQL or Firestore | ≤ 1 h | ≤ 4 h | PITR + daily export to GCS (cross-region) |
| Approvals / audit log | append-only store + Logging | 0 (no loss) | ≤ 4 h | Append-only + log sink; immutable |
| Run-cost / Cost-Guard ledger | BigQuery | ≤ 24 h | ≤ 8 h | BigQuery snapshot / table backup |
| Agent/QC telemetry | BigQuery | ≤ 24 h | best-effort | Partitioned tables; recreatable |
| Secrets | Secret Manager | 0 | ≤ 1 h | Versioned; IaC re-provision |
| Platform state (campaigns) | source of truth = the ad platform | n/a | re-fetch | Reconcile from platform APIs (not Kobi-owned) |
DR posture
- Region: primary in the agreed EU/TR-aligned region (GCP topology — data residency); cross-region backups for registry + audit.
- Recovery: infra is IaC-defined → redeploy stateless Cloud Run services to a recovery region; restore registry from PITR/export; secrets re-provisioned from Secret Manager.
- DR drill: restore-from-backup rehearsal before GA and at least annually; document actual measured RTO/RPO vs targets.
- Idempotency safety net: because mutations dedupe by
run_id, replaying queued work after recovery will not double-create campaigns or double-spend.
Platform account suspension & business continuity
The single largest concentration risk: an agency-level suspension (Google MCC, Meta BM, TikTok BC) can take all tenants on that platform dark at once. This is the operational runbook for gameplan B8.
| Lever | Detail |
|---|---|
| Prevention | Maintain good standing: business verification kept current; policy-compliant creatives via compliance QC; avoid mass-identical mutations that trip abuse heuristics; stay under rate/creation quotas |
| Early warning | Monitor account-status fields + disapproval webhooks/polling; alert on account_status changes, rising disapproval rate, or rate-limit/RESOURCE_EXHAUSTED spikes → System Ops |
| Blast-radius sharding | Don't concentrate all tenants under one master where avoidable: Meta child-BM-per-client (2-Tier, PRE-10) already isolates Meta tenants; consider >1 Google MCC beyond a threshold so one suspension ≠ total outage |
| Appeal path | Per-platform appeal owners + escalation contacts (Meta rep, Google/TikTok account managers) documented; appeal SLAs tracked; break-glass human admins retained on every platform |
| Client comms | Pre-drafted suspension comms template; status surfaced in Human Touch dashboard; pause optimization + invoicing impact noted |
| Continuity | If one platform is down, continue on others (plans are multi-platform); never let a single platform suspension block onboarding or reporting for unaffected channels |
Compliance
- Module data boundary (ADR 0004): this module does not store PII or client financial data; CRM and billing live in sibling systems. CAPI / offline conversion pipelines deferred — revisit hash-only rules when built.
- Legal confirm required: KVKK/GDPR applicability, entity jurisdiction (TR vs EU), and prod region lock — ask legal team before go-live despite the narrow datastore boundary above.
- Consent Mode and privacy policy linkage (when relay/CAPI land)
- Health and education ad policies enforced in agent guardrails
- Client approval of data sharing with sub-processors/vendors and marketing use — see Vision & scope — Client agreements
- Client bears responsibility for lawfulness of data they permit to flow to ad networks and analytics; Kobi operates only on approved, versioned configurations
Vendor access
- No third-party agency access to tenant accounts without contract
- Sub-processor / vendor access limited to client-approved scope recorded at onboarding and plan approval