GCP Deployment Topology (Illustrative)

Note: This topology is a reference architecture for the implementation phase. The tech stack is locked to all-TypeScript (ADR 0001); IaC starts as gcloud scripts and moves to Terraform at multi-region / second-engineer.

Goals

Run agent orchestration and domain services on Google Cloud
Support async event processing, analytics warehouse, and secret management
Enable handoff to implementation team with clear environment boundaries

Environment tiers

Environment	Purpose
`dev`	Integration testing, sandbox ad accounts
`staging`	Pre-prod with production-like configs
`prod`	Live client workloads

Reference topology

AI & agent runtime (where the agentic network lives)

The agentic network is not a single monolith. It spans three GCP surfaces:

Layer	GCP service	Role
Model inference	Vertex AI Model Garden + Generative AI API	All LLM calls — Gemini (primary), Anthropic Claude, Meta Llama (selective)
Agent runtime	Vertex AI Agent Engine	Managed, serverless execution for ADK / LangGraph agents — scaling, tracing, sessions
Domain glue	Cloud Run + Pub/Sub	Orchestrator control plane, platform connectors, HITL API, event bus, non-LLM business logic

Dashboard split: Human Touch (operator approvals) vs System Ops (logs, statistics, health) — separate BFFs, IAP on System Ops only.

Default region: europe-west1 (or tenant-configurable). Use global Vertex endpoints for Gemini 3.x where latency allows; use regional endpoints when data residency requires it (non-global endpoints may carry a ~10% premium after July 2026 — see Vertex AI pricing).

Framework choice (implementation phase): Agent Development Kit (ADK) for agent primitives; deploy to Agent Engine via adk deploy. LangGraph acceptable for complex state machines. All models accessed through Vertex — no direct third-party API keys in application code.

Service mapping (conceptual)

Logical component	Illustrative GCP hosting
API + Dashboard BFF	Cloud Run
Agent orchestrator (state machine, routing)	Cloud Run + Vertex AI Agent Engine
Domain agents (Onboarding, Plan, Execution, …)	Agent Engine (ADK / LangGraph)
Sub-agents & quality-check agents	Agent Engine — lightweight invocations
LLM inference	Vertex AI Model Garden (Gemini primary; Claude for optional QC escalation)
Model routing / classifier	`gemini-3.1-flash-lite` on Vertex (sub-ms classifiers)
Cost Guard (deterministic spend circuit breaker)	Cloud Run service — gates every Vertex call; 3× estimate kill switch
Run cost ledger	Firestore or Redis — `run_id` estimated vs actual USD
QC loop telemetry	BigQuery `agent_qc_results`, `agent_qc_loops` + GCS structured input/QC JSON
QC threshold alert job	Cloud Scheduler + Cloud Run — 80% success floor; emits `agent.qc.threshold.breached`
Context cache (system prompts, tool schemas)	Vertex AI context caching
Platform connector workers	Cloud Run Jobs + Pub/Sub
Event bus	Pub/Sub topics per domain
Tenant registry & plans	Cloud SQL or Firestore
Analytics & reporting	BigQuery + Looker Studio or embedded charts
Feed files	Cloud Storage + scheduled transfers
First-party tag relay (Phase 2+)	Cloud Run origin; Cloudflare edge for client hostnames — see First-party tag relay
Secrets / OAuth tokens	Secret Manager
IaC	Terraform in future repo path

Networking

Private services where possible; VPC connector for Cloud Run → private CRM endpoints
Egress allowlist for platform APIs (Google Ads, Meta, TikTok, etc.)
No platform credentials in container images

CI/CD (future phase)

Data residency

Default region: europe-west1 or tenant-configurable (TBD with legal)
Kobi BigQuery datasets partitioned by tenant_id — includes Google Ads export (agency-owned accounts), agent telemetry, audit rollups
Not in Kobi BQ: client GA4 export tables, MMP export tables (client-operated on their GCP if used)
CRM PII stays in CRM; warehouse stores hashed keys where possible

Cost controls

Cloud Run min instances = 0 for non-critical paths
Budget alerts per project/environment
BigQuery slot or on-demand caps per environment
LLM budget alerts per environment (Vertex AI billing export → BigQuery)
Model routing policy enforced in orchestrator — never default to Pro/Opus (see Agentic orchestration — Model strategy)
Vertex context caching for stable system prompts, tool schemas, and tenant playbooks
Batch API for non-urgent reporting summarization (50% discount on eligible models)
Cost Guard — deterministic 3× estimate trip per run_id; not agentic (Agentic orchestration — Cost Guard)

Skeleton deliverable (later phase — not Phase 0)

When implementation starts, skeleton includes:

Terraform modules: project, IAM, Pub/Sub, Cloud Run, Secret Manager, BigQuery
Empty service stubs with health checks
GitHub Actions deploy pipeline to dev only

First-party tag relay — deferred; hybrid Cloudflare + GCP cost bands
System overview
Security & governance
Roadmap Phase 0

Goals

Environment tiers

Reference topology

AI & agent runtime (where the agentic network lives)

Service mapping (conceptual)

Networking

CI/CD (future phase)

Data residency

Cost controls

Skeleton deliverable (later phase — not Phase 0)

Related documents