Architecture · Draft

GCP Deployment Topology (Illustrative)

Created 9 Jun 2026·Updated 11 Jun 2026

Latest change: Stack locked; IaC via gcloud then Terraform

Draft document — deep-dive spec incomplete; content will be updated before and during build. Do not treat as signed-off implementation detail. Pack overview

Note: This topology is a reference architecture for the implementation phase. The tech stack is locked to all-TypeScript (ADR 0001); IaC starts as gcloud scripts and moves to Terraform at multi-region / second-engineer.

Goals

  • Run agent orchestration and domain services on Google Cloud
  • Support async event processing, analytics warehouse, and secret management
  • Enable handoff to implementation team with clear environment boundaries

Environment tiers

Environment Purpose
dev Integration testing, sandbox ad accounts
staging Pre-prod with production-like configs
prod Live client workloads

Reference topology

ObservabilitySecurityDataMessagingVertex AI Agent PlatformComputeIngressCloud Load BalancingAPI GatewayCloud Run ServicesCloud Run JobsAgent Engine runtimeModel Garden endpointsGemini + partner modelsContext cachingCloud Pub/SubBigQueryFirestore or Cloud SQLCloud StorageSecret ManagerCloud IAMCloud LoggingCloud MonitoringCloud Trace

AI & agent runtime (where the agentic network lives)

The agentic network is not a single monolith. It spans three GCP surfaces:

Layer GCP service Role
Model inference Vertex AI Model Garden + Generative AI API All LLM calls — Gemini (primary), Anthropic Claude, Meta Llama (selective)
Agent runtime Vertex AI Agent Engine Managed, serverless execution for ADK / LangGraph agents — scaling, tracing, sessions
Domain glue Cloud Run + Pub/Sub Orchestrator control plane, platform connectors, HITL API, event bus, non-LLM business logic
Vertex AICloud RunOrchestrator control planePlatform connectorsHuman Touch BFFSystem Ops BFF IAPAgent EngineModel Gardengemini-3.1-flash-litegemini-3.5-flashgemini-3.1-pro-previewclaude-sonnet-4-6escalationPub/SubBigQuery telemetry

Dashboard split: Human Touch (operator approvals) vs System Ops (logs, statistics, health) — separate BFFs, IAP on System Ops only.

Default region: europe-west1 (or tenant-configurable). Use global Vertex endpoints for Gemini 3.x where latency allows; use regional endpoints when data residency requires it (non-global endpoints may carry a ~10% premium after July 2026 — see Vertex AI pricing).

Framework choice (implementation phase): Agent Development Kit (ADK) for agent primitives; deploy to Agent Engine via adk deploy. LangGraph acceptable for complex state machines. All models accessed through Vertex — no direct third-party API keys in application code.

Service mapping (conceptual)

Logical component Illustrative GCP hosting
API + Dashboard BFF Cloud Run
Agent orchestrator (state machine, routing) Cloud Run + Vertex AI Agent Engine
Domain agents (Onboarding, Plan, Execution, …) Agent Engine (ADK / LangGraph)
Sub-agents & quality-check agents Agent Engine — lightweight invocations
LLM inference Vertex AI Model Garden (Gemini primary; Claude for optional QC escalation)
Model routing / classifier gemini-3.1-flash-lite on Vertex (sub-ms classifiers)
Cost Guard (deterministic spend circuit breaker) Cloud Run service — gates every Vertex call; 3× estimate kill switch
Run cost ledger Firestore or Redis — run_id estimated vs actual USD
QC loop telemetry BigQuery agent_qc_results, agent_qc_loops + GCS structured input/QC JSON
QC threshold alert job Cloud Scheduler + Cloud Run — 80% success floor; emits agent.qc.threshold.breached
Context cache (system prompts, tool schemas) Vertex AI context caching
Platform connector workers Cloud Run Jobs + Pub/Sub
Event bus Pub/Sub topics per domain
Tenant registry & plans Cloud SQL or Firestore
Analytics & reporting BigQuery + Looker Studio or embedded charts
Feed files Cloud Storage + scheduled transfers
First-party tag relay (Phase 2+) Cloud Run origin; Cloudflare edge for client hostnames — see First-party tag relay
Secrets / OAuth tokens Secret Manager
IaC Terraform in future repo path

Networking

  • Private services where possible; VPC connector for Cloud Run → private CRM endpoints
  • Egress allowlist for platform APIs (Google Ads, Meta, TikTok, etc.)
  • No platform credentials in container images

CI/CD (future phase)

GitHub ActionsBuild and testArtifact RegistryDeploy devDeploy stagingManual approvalDeploy prod

Data residency

  • Default region: europe-west1 or tenant-configurable (TBD with legal)
  • Kobi BigQuery datasets partitioned by tenant_id — includes Google Ads export (agency-owned accounts), agent telemetry, audit rollups
  • Not in Kobi BQ: client GA4 export tables, MMP export tables (client-operated on their GCP if used)
  • CRM PII stays in CRM; warehouse stores hashed keys where possible

Cost controls

  • Cloud Run min instances = 0 for non-critical paths
  • Budget alerts per project/environment
  • BigQuery slot or on-demand caps per environment
  • LLM budget alerts per environment (Vertex AI billing export → BigQuery)
  • Model routing policy enforced in orchestrator — never default to Pro/Opus (see Agentic orchestration — Model strategy)
  • Vertex context caching for stable system prompts, tool schemas, and tenant playbooks
  • Batch API for non-urgent reporting summarization (50% discount on eligible models)
  • Cost Guard — deterministic 3× estimate trip per run_id; not agentic (Agentic orchestration — Cost Guard)

Skeleton deliverable (later phase — not Phase 0)

When implementation starts, skeleton includes:

  • Terraform modules: project, IAM, Pub/Sub, Cloud Run, Secret Manager, BigQuery
  • Empty service stubs with health checks
  • GitHub Actions deploy pipeline to dev only