Architecture · Draft
GCP Deployment Topology (Illustrative)
Note: This topology is a reference architecture for the implementation phase. The tech stack is locked to all-TypeScript (ADR 0001); IaC starts as
gcloudscripts and moves to Terraform at multi-region / second-engineer.
Goals
- Run agent orchestration and domain services on Google Cloud
- Support async event processing, analytics warehouse, and secret management
- Enable handoff to implementation team with clear environment boundaries
Environment tiers
| Environment | Purpose |
|---|---|
dev |
Integration testing, sandbox ad accounts |
staging |
Pre-prod with production-like configs |
prod |
Live client workloads |
Reference topology
AI & agent runtime (where the agentic network lives)
The agentic network is not a single monolith. It spans three GCP surfaces:
| Layer | GCP service | Role |
|---|---|---|
| Model inference | Vertex AI Model Garden + Generative AI API | All LLM calls — Gemini (primary), Anthropic Claude, Meta Llama (selective) |
| Agent runtime | Vertex AI Agent Engine | Managed, serverless execution for ADK / LangGraph agents — scaling, tracing, sessions |
| Domain glue | Cloud Run + Pub/Sub | Orchestrator control plane, platform connectors, HITL API, event bus, non-LLM business logic |
Dashboard split: Human Touch (operator approvals) vs System Ops (logs, statistics, health) — separate BFFs, IAP on System Ops only.
Default region: europe-west1 (or tenant-configurable). Use global Vertex endpoints for Gemini 3.x where latency allows; use regional endpoints when data residency requires it (non-global endpoints may carry a ~10% premium after July 2026 — see Vertex AI pricing).
Framework choice (implementation phase): Agent Development Kit (ADK) for agent primitives; deploy to Agent Engine via adk deploy. LangGraph acceptable for complex state machines. All models accessed through Vertex — no direct third-party API keys in application code.
Service mapping (conceptual)
| Logical component | Illustrative GCP hosting |
|---|---|
| API + Dashboard BFF | Cloud Run |
| Agent orchestrator (state machine, routing) | Cloud Run + Vertex AI Agent Engine |
| Domain agents (Onboarding, Plan, Execution, …) | Agent Engine (ADK / LangGraph) |
| Sub-agents & quality-check agents | Agent Engine — lightweight invocations |
| LLM inference | Vertex AI Model Garden (Gemini primary; Claude for optional QC escalation) |
| Model routing / classifier | gemini-3.1-flash-lite on Vertex (sub-ms classifiers) |
| Cost Guard (deterministic spend circuit breaker) | Cloud Run service — gates every Vertex call; 3× estimate kill switch |
| Run cost ledger | Firestore or Redis — run_id estimated vs actual USD |
| QC loop telemetry | BigQuery agent_qc_results, agent_qc_loops + GCS structured input/QC JSON |
| QC threshold alert job | Cloud Scheduler + Cloud Run — 80% success floor; emits agent.qc.threshold.breached |
| Context cache (system prompts, tool schemas) | Vertex AI context caching |
| Platform connector workers | Cloud Run Jobs + Pub/Sub |
| Event bus | Pub/Sub topics per domain |
| Tenant registry & plans | Cloud SQL or Firestore |
| Analytics & reporting | BigQuery + Looker Studio or embedded charts |
| Feed files | Cloud Storage + scheduled transfers |
| First-party tag relay (Phase 2+) | Cloud Run origin; Cloudflare edge for client hostnames — see First-party tag relay |
| Secrets / OAuth tokens | Secret Manager |
| IaC | Terraform in future repo path |
Networking
- Private services where possible; VPC connector for Cloud Run → private CRM endpoints
- Egress allowlist for platform APIs (Google Ads, Meta, TikTok, etc.)
- No platform credentials in container images
CI/CD (future phase)
Data residency
- Default region:
europe-west1or tenant-configurable (TBD with legal) - Kobi BigQuery datasets partitioned by
tenant_id— includes Google Ads export (agency-owned accounts), agent telemetry, audit rollups - Not in Kobi BQ: client GA4 export tables, MMP export tables (client-operated on their GCP if used)
- CRM PII stays in CRM; warehouse stores hashed keys where possible
Cost controls
- Cloud Run min instances = 0 for non-critical paths
- Budget alerts per project/environment
- BigQuery slot or on-demand caps per environment
- LLM budget alerts per environment (Vertex AI billing export → BigQuery)
- Model routing policy enforced in orchestrator — never default to Pro/Opus (see Agentic orchestration — Model strategy)
- Vertex context caching for stable system prompts, tool schemas, and tenant playbooks
- Batch API for non-urgent reporting summarization (50% discount on eligible models)
- Cost Guard — deterministic 3× estimate trip per
run_id; not agentic (Agentic orchestration — Cost Guard)
Skeleton deliverable (later phase — not Phase 0)
When implementation starts, skeleton includes:
- Terraform modules: project, IAM, Pub/Sub, Cloud Run, Secret Manager, BigQuery
- Empty service stubs with health checks
- GitHub Actions deploy pipeline to
devonly
Related documents
- First-party tag relay — deferred; hybrid Cloudflare + GCP cost bands
- System overview
- Security & governance
- Roadmap Phase 0