Execution & Cost

Cost Model & Estimates (Total Cost to Operate the Module)

Created 11 Jun 2026·Updated 12 Jun 2026

Latest change: ADR 0004: S4–S9 scope — per-tenant limits, no organic posting, data boundary, onboarding business type

Purpose: A single, auditable view of what it costs Kobi to run the Digital Ads module — LLM inference + Vertex Agent Engine + core GCP infra + data warehouse + (optional) first-party relay + one-time build. This complements the detailed LLM cost catalog (kept as the source for token-level numbers) by consolidating every cost layer into per-tenant and portfolio totals.

Audience: Parent-project / VC finance and engineering. Commercial figures (pricing, media take-rate, revenue) are owned at the parent/VC level — this doc is the cost side only, so those numbers can be slotted against it.

Confidence: Planning bands. Unit prices verified June 2026 against the sources in §1; re-verify before any commitment — cloud and model prices drift, and several Gemini/Agent-Engine SKUs are new in 2026.

Related: Agentic orchestration — LLM cost catalog · GCP topology · First-party tag relay — cost · Execution gameplan


0. Scope — what is and isn't counted

Counted (Kobi-incurred operating cost) Excluded (and why)
LLM inference (Vertex Model Garden tokens) Media spend pass-through — this is client budget Kobi fronts/invoices; it is working capital, not opex (see §7)
Vertex AI Agent Engine runtime (compute/memory/sessions) Parent Kobi platform shared services (core CRM product, identity, billing ERP) — separate budget
Cloud Run (orchestrator, connectors, BFFs, Cost Guard, jobs) Client GA4 → BigQuery / MMP export — client-operated, out of scope
BigQuery (Google Ads export + agent/QC telemetry) Human ops/eng salaries — headcount, not infra
Pub/Sub, Firestore/Cloud SQL, Secret Manager, GCS, Logging DV360 minimum-spend commitment — a commercial contract term, not infra
First-party relay (Phase 2+, optional SKU) Platform API fees — Google/Meta/TikTok ad APIs are free to call (rate-limited, not metered)
One-time engineering build (§6) Third-party CMP licenses (only if Kobi resells one)

Tenant profiles (from media-planning) and portfolio sizes (Pilot 5 / Growth 50 / Scale 200) match the existing docs so numbers reconcile.


1. Unit prices (verified — source table)

Vertex global standard (non-batch) rates; GCP Tier-1 region (e.g. europe-west1). First-tier free allowances noted.

Resource Unit price Free tier / month Source
Gemini 3.1 Flash-Lite $0.25 in / $1.50 out per 1M tok Vertex pricing
Gemini 3.5 Flash $1.50 in / $9.00 out per 1M tok Vertex pricing
Gemini 3.1 Pro Preview $2.00 in / $12.00 out per 1M tok Vertex pricing
Claude Sonnet 4.x / Opus 4.x (escalation) $3/$15 · $5/$25 per 1M tok Anthropic on Vertex
Vertex Batch API −50% on eligible Gemini Vertex pricing
Context caching ~−30–50% on cached input Vertex pricing
Agent Engine — compute $0.0864 / vCPU-hour 50 vCPU-hours Agent Engine pricing
Agent Engine — memory $0.0090 / GiB-hour 100 GiB-hours Agent Engine pricing
Agent Engine — sessions $0.25 / 1,000 events Agent Engine pricing (billing since Feb 11 2026)
Cloud Run — vCPU $0.000024 / vCPU-second (~$0.0864/hr) 180,000 vCPU-s Cloud Run pricing
Cloud Run — memory $0.0000025 / GiB-second (~$0.009/hr) 360,000 GiB-s Cloud Run pricing
Cloud Run — requests $0.40 / million 2,000,000 Cloud Run pricing
BigQuery — query (on-demand) $6.25 / TiB scanned 1 TiB BigQuery pricing
BigQuery — active storage $0.02 / GiB-month (long-term $0.01) 10 GiB BigQuery pricing
Pub/Sub $40 / TiB throughput 10 GiB Pub/Sub pricing
Secret Manager $0.06 / active version-month; $0.03 / 10k access 6 versions Secret Manager pricing (verify)
Cloud Logging $0.50 / GiB ingested 50 GiB Logging pricing (verify)
Cloud Storage (standard) ~$0.020 / GB-month 5 GB GCS pricing (verify)
Network egress ~$0.12 / GB (internet) 1 GB (verify)
Platform ad APIs (Google/Meta/TikTok/DV360) $0 (free, rate-limited) per platform docs

2. Recurring cost by layer

2.1 LLM inference (the dominant variable cost)

From the per-task catalog. Expected (standard, no cache), per tenant / month:

Profile Expected + caching (~−25–30%) High band (thinking + tool-loops)
Starter $8.55 ~$6.40 ~$15–18
Standard $25.60 ~$19–22 ~$48–55
Ecommerce $37.15 ~$27–31 ~$72–82
Onboarding (one-time) $1.60–2.00

2.2 Vertex Agent Engine runtime (agent compute, separate from tokens)

Agent Engine bills the container wall-time of agent invocations (model tokens billed separately in §2.1). Idle is not billed; first 50 vCPU-h + 100 GiB-h/month are free (absorbs the pilot almost entirely).

Assumptions: ~1 vCPU + 2 GiB per invocation; avg ~15 s wall-time/invocation; invocation counts from the task catalog; ~3 session events/invocation.

Profile Invocations/mo vCPU-h Compute+mem Sessions Agent Engine /tenant
Starter ~500 ~2.1 ~$0.20 ~$0.38 ~$0.6
Standard ~1,400 ~5.8 ~$0.61 ~$1.05 ~$1.7
Ecommerce ~2,300 ~9.6 ~$1.00 ~$1.70 ~$2.7

Free tier covers ~8 vCPU-h before billing — i.e. the first 1–2 tenants are effectively free; the per-tenant numbers above apply at steady-state scale.

2.3 Cloud Run (orchestrator control plane, connectors, BFFs, Cost Guard, jobs)

Two parts: fixed warm services (see §4) and per-tenant request/compute (connector mutations, polling, reporting). Requests are negligible at $0.40/M; compute is short bursts.

Profile Per-tenant Cloud Run (variable)
Starter ~$0.20
Standard ~$0.40
Ecommerce ~$0.60

2.4 Data warehouse (BigQuery: Google Ads export + telemetry)

Per tenant: Google Ads export storage is tiny (tens of MB/mo for an SMB); reporting/optimization joins scan a few GB/mo; QC/Cost-Guard telemetry adds rows. Partitioning by tenant_id + clustering keeps scans small (the docs already specify this).

Profile Per-tenant BigQuery (storage + query)
Starter ~$0.15
Standard ~$0.30
Ecommerce ~$0.50

Risk lever: un-partitioned queries can 10–100× this. Enforce partition/cluster + SELECT only needed columns (already in the playbook).

2.5 Other GCP (Pub/Sub, Secret Manager, GCS, Logging, egress) — per tenant

Item Per-tenant /mo Note
Secret Manager ~$0.25 ~4 platform-token versions/tenant
Pub/Sub <$0.05 small events; $40/TiB
GCS (artifacts/feeds) <$0.10 $0.02/GB
Logging ~$0.05–0.20 grows at scale; 50 GiB free
Egress (fan-out) <$0.10 small payloads
Subtotal "other" ~$0.5–0.7

2.6 First-party relay (Phase 2+, optional SKU)

From relay cost: ~$1–3/tenant (low traffic) to $6–12 (high traffic); ~$80–150/mo total at 50 tenants (hybrid Cloudflare + Cloud Run). Deferred / not in base run-cost below unless the relay SKU is sold.

2.7 Platform API costs

$0. Google Ads, Meta Marketing, TikTok Marketing, DV360, GA4 Data/Admin APIs are free to call — the constraint is rate limits and approval tiers, not metered fees (see platform access).


3. Per-tenant total cost of ownership (blended, excl. relay & media)

Sum of §2.1 (expected) + §2.2 + §2.3 + §2.4 + §2.5:

Profile LLM Agent Eng. Cloud Run BQ Other Per-tenant /mo (expected) With caching High band
Starter $8.55 $0.6 $0.2 $0.15 $0.6 ~$10.1 ~$8 ~$20
Standard $25.60 $1.7 $0.4 $0.30 $0.6 ~$28.6 ~$22–25 ~$58
Ecommerce $37.15 $2.7 $0.6 $0.50 $0.7 ~$41.7 ~$31–35 ~$87

Non-LLM infra is ~10–15% of per-tenant cost; LLM dominates. Optimization frequency and tool-loop depth are the real levers (§5). Add +$1–12/tenant if the relay SKU is enabled.


4. Fixed baseline infra (independent of tenant count)

Costs incurred even at zero/low tenants — mostly warm prod services + non-prod environments.

Item Monthly band
Cloud Run warm services in prod (dashboard BFF warm; rest scale-to-zero — see §4.1) $30–70
dev + staging environments (min-instances 0; light use) $30–60
Tenant registry / run-cost ledger (Cloud SQL small or Firestore) $30–80
Logging + Monitoring baseline $20–50
BigQuery baseline storage + scheduled jobs $5–20
Secret Manager base, Scheduler, misc $10–20
Fixed baseline total ~$130–300 / mo (plan ~$220)

Vertex Agent Engine + Cloud Run + BigQuery free tiers absorb most pilot-scale usage, so at 1–5 tenants the bill is dominated by this fixed baseline, not per-tenant cost.

4.1 Cutting the warm-service baseline

Warm (min-instance) services are the largest fixed cost, so they're worth optimizing. Conclusion: cut warm cost aggressively, but stay on Cloud Run — do not move the control plane to VMs.

Why not VMs (the serverless control plane wins here):

  • The control plane is mostly async/event-driven — orchestrator, connectors, Cost Guard, and jobs are Pub/Sub-triggered, so they don't need to be warm at all; they should scale to zero. Only the human-dashboard BFF is latency-sensitive enough to keep warm.
  • "Scale resources by client amount" is already what Cloud Run does — per-request, concurrency-based autoscaling with per-tenant isolation. A hand-resized VM is worse at this: noisy-neighbor across tenants, manual vertical scaling (downtime), and no scale-to-zero.
  • A VM saves only ~$50–150/mo but adds OS patching, container orchestration, health checks, capacity planning — ops/eng time that costs far more than the saving and breaks the serverless-first principle (system overview). Startup credits (§9.1) zero this line in Year 1 anyway.

The real levers (all within Cloud Run) — together ~−60–70% on the warm line:

Lever How Effect
Scale-to-zero for async services min-instances=0 on orchestrator workers, connectors, Cost Guard, jobs (cold start fine for Pub/Sub work) removes most warm floors
Warm only the dashboard BFF min-instances=1 only on the synchronous user-facing surface one floor, not four
Schedule warmth to business hours Cloud Scheduler sets BFF min-instances 1 in work hours, 0 nights/weekends −65% on that service (50 of 168 h/wk)
Request-based billing (CPU throttled idle) warm idle instances bill mostly memory ($0.009/GiB-h), not full vCPU ($0.0864/vCPU-h) ~−60–80% on idle cost
Consolidate services each warm service has its own min-instance floor — fewer services = less baseline linear on floor count
Cloud Run CUD on the residual warm floor instance-based services 28%/46% (1/3-yr) — §9.2 −28–46% on what's left, post-credits

Where a VM does pay: long-running batch (e.g. heavy continuous optimization sweeps) on Spot VMs (~−60–91%). But Cloud Run Jobs already scale to zero, so a Spot VM only wins if the batch runs continuously for hours — not the case at pilot/growth scale. Revisit only if a sustained batch workload emerges.


5. Portfolio totals (all-in monthly, expected, excl. relay & media)

Variable = Σ per-tenant (§3); plus fixed baseline (§4). Mix mirrors the existing docs.

Portfolio Mix LLM Non-LLM infra (variable) Fixed Total /mo (no cache) With caching
Pilot — 5 3 Starter + 2 Standard ~$77 ~$11 ~$150 ~$240 ~$220
Growth — 50 25 + 20 + 5 Ecom ~$912 ~$120 ~$250 ~$1,280 ~$1,080
Scale — 200 100 + 80 + 20 Ecom ~$3,650 ~$480 ~$350 ~$4,480 ~$3,675

Fixed column reflects the trimmed warm baseline from §4.1 (dashboard BFF warm, rest scale-to-zero); it grows modestly with logging/registry as tenants scale.

Blended all-in per tenant (incl. amortized fixed): Pilot ~$48 (fixed-heavy), Growth ~$26, Scale ~$22 (no cache) / ~$18 (cached).

Add relay if sold: +$80–150/mo at Growth, +$350–650/mo at Scale. Reference point: at Scale, total infra+LLM ≈ $4–5K/mo for 200 clients — a rounding error against the media budgets being managed (see §7).


6. One-time build cost (engineer-weeks)

Derived from the roadmap workstreams (summed effort, not calendar). Expressed in engineer-weeks so the parent/VC can apply their own loaded rate.

Block Engineer-weeks
Phase 1 — foundations + Google + GA4 + onboarding + planning/exec + reporting ~15
Cross-cutting — orchestrator + agents + HITL + optimization agent ~17
Phase 2 — Meta + TikTok + onboarding extend + feed + optimization v1 ~11
Phase 4 — CRM loop + CAPI + match monitoring + closed-loop reporting ~8
Core subtotal ~51
Phase 3 — DV360 (only if contract) +6
First-party relay (optional SKU) +10–12
With DV360 + relay ~67–69

Calendar: ~18–26 weeks with 2–4 engineers (matches roadmap). Illustrative cost = engineer-weeks × your loaded weekly rate (rate is parent/VC-owned). The MVP cut (gameplan §4) defers ~15–20 of these weeks (Cost-Guard/QC-telemetry/model-promotion/relay/DV360) to protect the pilot date.

Build-phase cloud cost is negligible (free tiers + dev min-instances 0): plan ~$100–300/mo during build for dev/staging + test inference.


7. Media float / working capital (not opex)

The single largest cash item is not in the tables above because it is not Kobi's cost — it is client media budget Kobi fronts and recovers under the agency-billing model:

  • Kobi pays platforms (often on extended credit) and invoices clients monthly → Kobi carries ~1 month of every client's media spend as a receivable.
  • At Growth (50 clients × e.g. ₺50–150K/mo media), the float can be ₺millions — orders of magnitude larger than the ~$1.2K/mo infra cost.
  • This is a balance-sheet/financing question, not an operating cost, but it must be planned. Engineering lock (ADR 0004): every tenant has a per-tenant spend limit (credit_sub_limit + platform caps). If clients prepay, the limit tracks prepaid balance from the parent billing module's internal API; if payment works differently, finance / parent billing must define the limit source before prod.
  • Flagged in gameplan B3.
  • Tax on media spend (VAT/KDV + foreign-platform digital-services/withholding tax) further grosses up the invoiced amount and the float; tax treatment is parent/VC finance-owned — see vision & scope — billing. Not in the infra cost tables.

8. Sensitivity & levers

Lever Effect Where
Context caching on system prompts/playbooks −25–50% LLM input cost biggest single saving
Optimization frequency (cycles/mo per track) ~linear on the largest LLM line tune per plan tier
Model tier discipline (Flash vs Pro/Opus) 3–50× per call router + promotion policy
Batch API for scheduled reports/feed sweeps −50% on ~2–4% of total small but free
BigQuery partition/cluster 10–100× on query scans enforce in code
Cloud Run min-instances each warm vCPU ≈ $63/mo keep 0 where latency allows
Thinking level at high output tokens ×1.5–2.5 reserve for gated QC
Tool-loop depth input tokens ×2–4 over rounds cap loops (already 5/8)
Startup credits (Google for Startups) ~100% of GCP+Gemini, Year 1 biggest lever — see §9.1
Cloud Run CUD (commit baseline) −17% to −46% on Cloud Run only steady-state, post-credits (§9.2)

Three-point band (per tenant, Standard, incl. infra): Low ~$22 · Expected ~$28 · High ~$60. Design budgets to the High band, then replace assumptions with real run_id token logs from the pilot (Cost Guard ledger).

Optimize cost per accepted output, not $/1M tokens. When tuning model tiers, the objective is the lowest cost per QC-passed, human-accepted task — a cheaper model that fails QC triggers correction loops/retries/escalations that can cost more end-to-end. Start cheap and escalate on QC failure, demote on sustained success (model promotion/demotion); never demote safety/compliance-critical tasks on price (Special Ad Category, policy QC, health/education stay pinned to the higher tier).


9. Commitment & credit savings (GCP)

Three distinct mechanisms — only two are worth taking at this scale. Verify all rates/eligibility before committing.

9.1 Optional — Google for Startups Cloud Program (credits)

Not assumed in run-cost tables below. Leadership decides whether to apply; confirm with VC/board before pursuing.

Because this module is part of a VC/equity-backed project, it may qualify for Google's startup credits — which would dwarf run cost if granted:

Tier Coverage Notes
AI-first (Scale/AI tier) Up to $350K over 2 yrs — Yr1: 100% of eligible usage up to $250K; Yr2: 20% up to $100K Best fit (this is an AI-agent product)
Equity-backed (standard) Up to $200K — Yr1 up to $100K; Yr2 20% up to $100K Fallback if AI tier not granted
Pre-funding Up to $2K / yr Not relevant once VC-backed
  • Covers: Gemini models + GCP services — Vertex AI, Agent Engine, BigQuery, Cloud Run, Pub/Sub, etc. (i.e. essentially the entire model in §3/§5).
  • Excludes: third-party models (Claude Sonnet/Opus escalation, Llama) — billed directly — and Marketplace. Escalation is ~2% of LLM spend, so coverage is ~98%+ of the bill.
  • Effect: run cost is ~$4K/yr (pilot) → ~$17K/yr (growth) → ~$55K/yr (scale) — all far under the $250K Year-1 cap. So Year 1 GCP+Gemini cost is effectively ~$0; only the 2% third-party escalation (tens of $/mo) is billed. Year 2 covers 20%.
  • Caveat: credits are runway, not a permanent discount — they expire at end of term. Use them to absorb the build + early-scale window, and budget steady-state to the §9.2/§8 numbers for after credits lapse.
  • Action: apply via the Google for Startups portal (no traditional grant application); confirm AI-tier eligibility with the account team while opening the Meta/DV360 relationships (gameplan PF track).

9.2 Committed Use Discounts (CUDs) — steady-state, after credits

Spend-based Compute Flexible CUD: commit a minimum hourly spend; overage bills on-demand.

Eligible spend 1-year 3-year Relevance here
Cloud Run — request-based services / functions 17% 17% Connectors, BFFs, jobs on request billing
Cloud Run — instance-based services, jobs, worker pools 28% 46% Best on the always-on warm baseline (§4) — run those instance-based
Compute Engine / GKE (N/C/E families) 28% 46% n/a — serverless-first
  • Net effect: applies only to the Cloud Run slice ($150–340/mo at scale), so absolute saving is **$40–90/mo at Scale** — real but small, since compute isn't the cost driver.
  • Low risk if you size the commitment to the always-on baseline only (not per-tenant burst).

9.3 Where commitment does NOT pay at this scale (avoid the trap)

Mechanism Headline discount Why to skip (for now)
BigQuery Editions slot commitment 20% (1yr) / ~37–40% (3yr) Requires switching from on-demand to slot/capacity (≥50-slot min). Our query scans are tiny and largely inside the 1-TiB free tier — on-demand ($6.25/TiB) is cheaper. Committing here would raise cost. Revisit only if telemetry query volume grows large.
Vertex AI Provisioned Throughput (PT) Reserved GSU capacity Pays off for high, steady low-latency inference. Agent load here is spiky/low → reserved capacity sits idle; on-demand + caching + batch is cheaper. Revisit at high Scale with steady volume.
Enterprise Discount Program (EDP) Negotiated custom Absolute spend (~$4–5K/mo) is too small to negotiate a meaningful committed-spend contract.

9.4 Stacked effect (illustrative)

Portfolio List price /mo Year 1 (startup credits) Steady-state (caching + Cloud Run CUD)
Pilot (5) ~$240 ~$0–20 (only 3rd-party escalation) ~$210
Growth (50) ~$1,280 ~$30–60 ~$1,030
Scale (200) ~$4,480 ~$60–120 ~$3,540

Order of magnitude of the levers: startup credits ≫ context caching > batch ≈ Cloud Run CUD ≫ BigQuery/PT commitments (negative here). Keep the design Gemini-first (already the policy) to maximize credit coverage, since third-party models are the only uncovered slice.


10. Bottom line

  • Per active client: ~$8–42/mo (profile-dependent, expected) all-in infra+LLM; ~$19–28 blended at scale.
  • Portfolio: ~$240/mo (pilot) → ~$1.1–1.3K/mo (50) → ~$3.7–4.5K/mo (200), excluding optional relay.
  • Fixed baseline ~$220/mo (trimmed warm services, §4.1) dominates until ~12–18 tenants.
  • Build — assumed go-live DDL (core, solo sprint): ~13 weeks calendar; G+M+T ~W9. Full-module accounting: ~50 engineer-weeks core (~67–69 with DV360 + relay) — capacity planning only, not MVP calendar.
  • The dominant cash item is media float (working capital), not infra — plan it separately.
  • All platform ad APIs are free; cost risk is concentrated in LLM optimization frequency — directly governed by Cost Guard + model routing already in the design.
  • GCP commitment/credits (§9): the Google for Startups Cloud Program (up to $350K / 2 yrs) effectively zeros the GCP+Gemini bill in Year 1 (only ~2% third-party escalation billed). Cloud Run CUDs add ~17–46% on the small compute slice afterward. Do not commit BigQuery slots or Vertex Provisioned Throughput at this scale — on-demand is cheaper.