Cost Model & Estimates (Total Cost to Operate the Module)

Purpose: A single, auditable view of what it costs Kobi to run the Digital Ads module — LLM inference + Vertex Agent Engine + core GCP infra + data warehouse + (optional) first-party relay + one-time build. This complements the detailed LLM cost catalog (kept as the source for token-level numbers) by consolidating every cost layer into per-tenant and portfolio totals.

Audience: Parent-project / VC finance and engineering. Commercial figures (pricing, media take-rate, revenue) are owned at the parent/VC level — this doc is the cost side only, so those numbers can be slotted against it.

Confidence: Planning bands. Unit prices verified June 2026 against the sources in §1; re-verify before any commitment — cloud and model prices drift, and several Gemini/Agent-Engine SKUs are new in 2026.

0. Scope — what is and isn't counted

Counted (Kobi-incurred operating cost)	Excluded (and why)
LLM inference (Vertex Model Garden tokens)	Media spend pass-through — this is client budget Kobi fronts/invoices; it is working capital, not opex (see §7)
Vertex AI Agent Engine runtime (compute/memory/sessions)	Parent Kobi platform shared services (core CRM product, identity, billing ERP) — separate budget
Cloud Run (orchestrator, connectors, BFFs, Cost Guard, jobs)	Client GA4 → BigQuery / MMP export — client-operated, out of scope
BigQuery (Google Ads export + agent/QC telemetry)	Human ops/eng salaries — headcount, not infra
Pub/Sub, Firestore/Cloud SQL, Secret Manager, GCS, Logging	DV360 minimum-spend commitment — a commercial contract term, not infra
First-party relay (Phase 2+, optional SKU)	Platform API fees — Google/Meta/TikTok ad APIs are free to call (rate-limited, not metered)
One-time engineering build (§6)	Third-party CMP licenses (only if Kobi resells one)

Tenant profiles (from media-planning) and portfolio sizes (Pilot 5 / Growth 50 / Scale 200) match the existing docs so numbers reconcile.

1. Unit prices (verified — source table)

Vertex global standard (non-batch) rates; GCP Tier-1 region (e.g. europe-west1). First-tier free allowances noted.

Resource	Unit price	Free tier / month	Source
Gemini 3.1 Flash-Lite	$0.25 in / $1.50 out per 1M tok	—	Vertex pricing
Gemini 3.5 Flash	$1.50 in / $9.00 out per 1M tok	—	Vertex pricing
Gemini 3.1 Pro Preview	$2.00 in / $12.00 out per 1M tok	—	Vertex pricing
Claude Sonnet 4.x / Opus 4.x (escalation)	$3/$15 · $5/$25 per 1M tok	—	Anthropic on Vertex
Vertex Batch API	−50% on eligible Gemini	—	Vertex pricing
Context caching	~−30–50% on cached input	—	Vertex pricing
Agent Engine — compute	$0.0864 / vCPU-hour	50 vCPU-hours	Agent Engine pricing
Agent Engine — memory	$0.0090 / GiB-hour	100 GiB-hours	Agent Engine pricing
Agent Engine — sessions	$0.25 / 1,000 events	—	Agent Engine pricing (billing since Feb 11 2026)
Cloud Run — vCPU	$0.000024 / vCPU-second (~$0.0864/hr)	180,000 vCPU-s	Cloud Run pricing
Cloud Run — memory	$0.0000025 / GiB-second (~$0.009/hr)	360,000 GiB-s	Cloud Run pricing
Cloud Run — requests	$0.40 / million	2,000,000	Cloud Run pricing
BigQuery — query (on-demand)	$6.25 / TiB scanned	1 TiB	BigQuery pricing
BigQuery — active storage	$0.02 / GiB-month (long-term $0.01)	10 GiB	BigQuery pricing
Pub/Sub	$40 / TiB throughput	10 GiB	Pub/Sub pricing
Secret Manager	$0.06 / active version-month; $0.03 / 10k access	6 versions	Secret Manager pricing (verify)
Cloud Logging	$0.50 / GiB ingested	50 GiB	Logging pricing (verify)
Cloud Storage (standard)	~$0.020 / GB-month	5 GB	GCS pricing (verify)
Network egress	~$0.12 / GB (internet)	1 GB	(verify)
Platform ad APIs (Google/Meta/TikTok/DV360)	$0 (free, rate-limited)	—	per platform docs

2. Recurring cost by layer

2.1 LLM inference (the dominant variable cost)

From the per-task catalog. Expected (standard, no cache), per tenant / month:

Profile	Expected	+ caching (~−25–30%)	High band (thinking + tool-loops)
Starter	$8.55	~$6.40	~$15–18
Standard	$25.60	~$19–22	~$48–55
Ecommerce	$37.15	~$27–31	~$72–82
Onboarding (one-time)	$1.60–2.00	—	—

2.2 Vertex Agent Engine runtime (agent compute, separate from tokens)

Agent Engine bills the container wall-time of agent invocations (model tokens billed separately in §2.1). Idle is not billed; first 50 vCPU-h + 100 GiB-h/month are free (absorbs the pilot almost entirely).

Assumptions: ~1 vCPU + 2 GiB per invocation; avg ~15 s wall-time/invocation; invocation counts from the task catalog; ~3 session events/invocation.

Profile	Invocations/mo	vCPU-h	Compute+mem	Sessions	Agent Engine /tenant
Starter	~500	~2.1	~$0.20	~$0.38	~$0.6
Standard	~1,400	~5.8	~$0.61	~$1.05	~$1.7
Ecommerce	~2,300	~9.6	~$1.00	~$1.70	~$2.7

Free tier covers ~8 vCPU-h before billing — i.e. the first 1–2 tenants are effectively free; the per-tenant numbers above apply at steady-state scale.

2.3 Cloud Run (orchestrator control plane, connectors, BFFs, Cost Guard, jobs)

Two parts: fixed warm services (see §4) and per-tenant request/compute (connector mutations, polling, reporting). Requests are negligible at $0.40/M; compute is short bursts.

Profile	Per-tenant Cloud Run (variable)
Starter	~$0.20
Standard	~$0.40
Ecommerce	~$0.60

2.4 Data warehouse (BigQuery: Google Ads export + telemetry)

Per tenant: Google Ads export storage is tiny (tens of MB/mo for an SMB); reporting/optimization joins scan a few GB/mo; QC/Cost-Guard telemetry adds rows. Partitioning by tenant_id + clustering keeps scans small (the docs already specify this).

Profile	Per-tenant BigQuery (storage + query)
Starter	~$0.15
Standard	~$0.30
Ecommerce	~$0.50

Risk lever: un-partitioned queries can 10–100× this. Enforce partition/cluster + SELECT only needed columns (already in the playbook).

2.5 Other GCP (Pub/Sub, Secret Manager, GCS, Logging, egress) — per tenant

Item	Per-tenant /mo	Note
Secret Manager	~$0.25	~4 platform-token versions/tenant
Pub/Sub	<$0.05	small events; $40/TiB
GCS (artifacts/feeds)	<$0.10	$0.02/GB
Logging	~$0.05–0.20	grows at scale; 50 GiB free
Egress (fan-out)	<$0.10	small payloads
Subtotal "other"	~$0.5–0.7

2.6 First-party relay (Phase 2+, optional SKU)

From relay cost: ~$1–3/tenant (low traffic) to $6–12 (high traffic); ~$80–150/mo total at 50 tenants (hybrid Cloudflare + Cloud Run). Deferred / not in base run-cost below unless the relay SKU is sold.

2.7 Platform API costs

$0. Google Ads, Meta Marketing, TikTok Marketing, DV360, GA4 Data/Admin APIs are free to call — the constraint is rate limits and approval tiers, not metered fees (see platform access).

3. Per-tenant total cost of ownership (blended, excl. relay & media)

Sum of §2.1 (expected) + §2.2 + §2.3 + §2.4 + §2.5:

Profile	LLM	Agent Eng.	Cloud Run	BQ	Other	Per-tenant /mo (expected)	With caching	High band
Starter	$8.55	$0.6	$0.2	$0.15	$0.6	~$10.1	~$8	~$20
Standard	$25.60	$1.7	$0.4	$0.30	$0.6	~$28.6	~$22–25	~$58
Ecommerce	$37.15	$2.7	$0.6	$0.50	$0.7	~$41.7	~$31–35	~$87

Non-LLM infra is ~10–15% of per-tenant cost; LLM dominates. Optimization frequency and tool-loop depth are the real levers (§5). Add +$1–12/tenant if the relay SKU is enabled.

4. Fixed baseline infra (independent of tenant count)

Costs incurred even at zero/low tenants — mostly warm prod services + non-prod environments.

Item	Monthly band
Cloud Run warm services in prod (dashboard BFF warm; rest scale-to-zero — see §4.1)	$30–70
dev + staging environments (min-instances 0; light use)	$30–60
Tenant registry / run-cost ledger (Cloud SQL small or Firestore)	$30–80
Logging + Monitoring baseline	$20–50
BigQuery baseline storage + scheduled jobs	$5–20
Secret Manager base, Scheduler, misc	$10–20
Fixed baseline total	~$130–300 / mo (plan ~$220)

Vertex Agent Engine + Cloud Run + BigQuery free tiers absorb most pilot-scale usage, so at 1–5 tenants the bill is dominated by this fixed baseline, not per-tenant cost.

4.1 Cutting the warm-service baseline

Warm (min-instance) services are the largest fixed cost, so they're worth optimizing. Conclusion: cut warm cost aggressively, but stay on Cloud Run — do not move the control plane to VMs.

Why not VMs (the serverless control plane wins here):

The control plane is mostly async/event-driven — orchestrator, connectors, Cost Guard, and jobs are Pub/Sub-triggered, so they don't need to be warm at all; they should scale to zero. Only the human-dashboard BFF is latency-sensitive enough to keep warm.
"Scale resources by client amount" is already what Cloud Run does — per-request, concurrency-based autoscaling with per-tenant isolation. A hand-resized VM is worse at this: noisy-neighbor across tenants, manual vertical scaling (downtime), and no scale-to-zero.
A VM saves only ~$50–150/mo but adds OS patching, container orchestration, health checks, capacity planning — ops/eng time that costs far more than the saving and breaks the serverless-first principle (system overview). Startup credits (§9.1) zero this line in Year 1 anyway.

The real levers (all within Cloud Run) — together ~−60–70% on the warm line:

Lever	How	Effect
Scale-to-zero for async services	`min-instances=0` on orchestrator workers, connectors, Cost Guard, jobs (cold start fine for Pub/Sub work)	removes most warm floors
Warm only the dashboard BFF	`min-instances=1` only on the synchronous user-facing surface	one floor, not four
Schedule warmth to business hours	Cloud Scheduler sets BFF `min-instances` 1 in work hours, 0 nights/weekends	~~−65% on that service (~~50 of 168 h/wk)
Request-based billing (CPU throttled idle)	warm idle instances bill mostly memory ($0.009/GiB-h), not full vCPU ($0.0864/vCPU-h)	~−60–80% on idle cost
Consolidate services	each warm service has its own min-instance floor — fewer services = less baseline	linear on floor count
Cloud Run CUD on the residual warm floor	instance-based services 28%/46% (1/3-yr) — §9.2	−28–46% on what's left, post-credits

Where a VM does pay: long-running batch (e.g. heavy continuous optimization sweeps) on Spot VMs (~−60–91%). But Cloud Run Jobs already scale to zero, so a Spot VM only wins if the batch runs continuously for hours — not the case at pilot/growth scale. Revisit only if a sustained batch workload emerges.

5. Portfolio totals (all-in monthly, expected, excl. relay & media)

Variable = Σ per-tenant (§3); plus fixed baseline (§4). Mix mirrors the existing docs.

Portfolio	Mix	LLM	Non-LLM infra (variable)	Fixed	Total /mo (no cache)	With caching
Pilot — 5	3 Starter + 2 Standard	~$77	~$11	~$150	~$240	~$220
Growth — 50	25 + 20 + 5 Ecom	~$912	~$120	~$250	~$1,280	~$1,080
Scale — 200	100 + 80 + 20 Ecom	~$3,650	~$480	~$350	~$4,480	~$3,675

Fixed column reflects the trimmed warm baseline from §4.1 (dashboard BFF warm, rest scale-to-zero); it grows modestly with logging/registry as tenants scale.

Blended all-in per tenant (incl. amortized fixed): Pilot ~$48 (fixed-heavy), Growth ~$26, Scale ~$22 (no cache) / ~$18 (cached).

Add relay if sold: +~~$80–150/mo at Growth, +~~$350–650/mo at Scale. Reference point: at Scale, total infra+LLM ≈ $4–5K/mo for 200 clients — a rounding error against the media budgets being managed (see §7).

6. One-time build cost (engineer-weeks)

Derived from the roadmap workstreams (summed effort, not calendar). Expressed in engineer-weeks so the parent/VC can apply their own loaded rate.

Block	Engineer-weeks
Phase 1 — foundations + Google + GA4 + onboarding + planning/exec + reporting	~15
Cross-cutting — orchestrator + agents + HITL + optimization agent	~17
Phase 2 — Meta + TikTok + onboarding extend + feed + optimization v1	~11
Phase 4 — CRM loop + CAPI + match monitoring + closed-loop reporting	~8
Core subtotal	~51
Phase 3 — DV360 (only if contract)	+6
First-party relay (optional SKU)	+10–12
With DV360 + relay	~67–69

Calendar: ~18–26 weeks with 2–4 engineers (matches roadmap). Illustrative cost = engineer-weeks × your loaded weekly rate (rate is parent/VC-owned). The MVP cut (gameplan §4) defers ~15–20 of these weeks (Cost-Guard/QC-telemetry/model-promotion/relay/DV360) to protect the pilot date.

Build-phase cloud cost is negligible (free tiers + dev min-instances 0): plan ~$100–300/mo during build for dev/staging + test inference.

7. Media float / working capital (not opex)

The single largest cash item is not in the tables above because it is not Kobi's cost — it is client media budget Kobi fronts and recovers under the agency-billing model:

Kobi pays platforms (often on extended credit) and invoices clients monthly → Kobi carries ~1 month of every client's media spend as a receivable.
At Growth (50 clients × e.g. ₺50–150K/mo media), the float can be ₺millions — orders of magnitude larger than the ~$1.2K/mo infra cost.
This is a balance-sheet/financing question, not an operating cost, but it must be planned. Engineering lock (ADR 0004): every tenant has a per-tenant spend limit (credit_sub_limit + platform caps). If clients prepay, the limit tracks prepaid balance from the parent billing module's internal API; if payment works differently, finance / parent billing must define the limit source before prod.
Flagged in gameplan B3.
Tax on media spend (VAT/KDV + foreign-platform digital-services/withholding tax) further grosses up the invoiced amount and the float; tax treatment is parent/VC finance-owned — see vision & scope — billing. Not in the infra cost tables.

8. Sensitivity & levers

Lever	Effect	Where
Context caching on system prompts/playbooks	−25–50% LLM input cost	biggest single saving
Optimization frequency (cycles/mo per track)	~linear on the largest LLM line	tune per plan tier
Model tier discipline (Flash vs Pro/Opus)	3–50× per call	router + promotion policy
Batch API for scheduled reports/feed sweeps	−50% on ~2–4% of total	small but free
BigQuery partition/cluster	10–100× on query scans	enforce in code
Cloud Run min-instances	each warm vCPU ≈ $63/mo	keep 0 where latency allows
Thinking level at `high`	output tokens ×1.5–2.5	reserve for gated QC
Tool-loop depth	input tokens ×2–4 over rounds	cap loops (already 5/8)
Startup credits (Google for Startups)	~100% of GCP+Gemini, Year 1	biggest lever — see §9.1
Cloud Run CUD (commit baseline)	−17% to −46% on Cloud Run only	steady-state, post-credits (§9.2)

Three-point band (per tenant, Standard, incl. infra): Low ~$22 · Expected ~$28 · High ~$60. Design budgets to the High band, then replace assumptions with real run_id token logs from the pilot (Cost Guard ledger).

Optimize cost per accepted output, not $/1M tokens. When tuning model tiers, the objective is the lowest cost per QC-passed, human-accepted task — a cheaper model that fails QC triggers correction loops/retries/escalations that can cost more end-to-end. Start cheap and escalate on QC failure, demote on sustained success (model promotion/demotion); never demote safety/compliance-critical tasks on price (Special Ad Category, policy QC, health/education stay pinned to the higher tier).

9. Commitment & credit savings (GCP)

Three distinct mechanisms — only two are worth taking at this scale. Verify all rates/eligibility before committing.

9.1 Optional — Google for Startups Cloud Program (credits)

Not assumed in run-cost tables below. Leadership decides whether to apply; confirm with VC/board before pursuing.

Because this module is part of a VC/equity-backed project, it may qualify for Google's startup credits — which would dwarf run cost if granted:

Tier	Coverage	Notes
AI-first (Scale/AI tier)	Up to $350K over 2 yrs — Yr1: 100% of eligible usage up to $250K; Yr2: 20% up to $100K	Best fit (this is an AI-agent product)
Equity-backed (standard)	Up to $200K — Yr1 up to $100K; Yr2 20% up to $100K	Fallback if AI tier not granted
Pre-funding	Up to $2K / yr	Not relevant once VC-backed

Covers: Gemini models + GCP services — Vertex AI, Agent Engine, BigQuery, Cloud Run, Pub/Sub, etc. (i.e. essentially the entire model in §3/§5).
Excludes: third-party models (Claude Sonnet/Opus escalation, Llama) — billed directly — and Marketplace. Escalation is ~2% of LLM spend, so coverage is ~98%+ of the bill.
Effect: run cost is ~$4K/yr (pilot) → ~$17K/yr (growth) → ~$55K/yr (scale) — all far under the $250K Year-1 cap. So Year 1 GCP+Gemini cost is effectively ~$0; only the ~~2% third-party escalation (~~tens of $/mo) is billed. Year 2 covers 20%.
Caveat: credits are runway, not a permanent discount — they expire at end of term. Use them to absorb the build + early-scale window, and budget steady-state to the §9.2/§8 numbers for after credits lapse.
Action: apply via the Google for Startups portal (no traditional grant application); confirm AI-tier eligibility with the account team while opening the Meta/DV360 relationships (gameplan PF track).

9.2 Committed Use Discounts (CUDs) — steady-state, after credits

Spend-based Compute Flexible CUD: commit a minimum hourly spend; overage bills on-demand.

Eligible spend	1-year	3-year	Relevance here
Cloud Run — request-based services / functions	17%	17%	Connectors, BFFs, jobs on request billing
Cloud Run — instance-based services, jobs, worker pools	28%	46%	Best on the always-on warm baseline (§4) — run those instance-based
Compute Engine / GKE (N/C/E families)	28%	46%	n/a — serverless-first

Net effect: applies only to the Cloud Run slice ($150–340/mo at scale), so absolute saving is **$40–90/mo at Scale** — real but small, since compute isn't the cost driver.
Low risk if you size the commitment to the always-on baseline only (not per-tenant burst).

9.3 Where commitment does NOT pay at this scale (avoid the trap)

Mechanism	Headline discount	Why to skip (for now)
BigQuery Editions slot commitment	20% (1yr) / ~37–40% (3yr)	Requires switching from on-demand to slot/capacity (≥50-slot min). Our query scans are tiny and largely inside the 1-TiB free tier — on-demand ($6.25/TiB) is cheaper. Committing here would raise cost. Revisit only if telemetry query volume grows large.
Vertex AI Provisioned Throughput (PT)	Reserved GSU capacity	Pays off for high, steady low-latency inference. Agent load here is spiky/low → reserved capacity sits idle; on-demand + caching + batch is cheaper. Revisit at high Scale with steady volume.
Enterprise Discount Program (EDP)	Negotiated custom	Absolute spend (~$4–5K/mo) is too small to negotiate a meaningful committed-spend contract.

9.4 Stacked effect (illustrative)

Portfolio	List price /mo	Year 1 (startup credits)	Steady-state (caching + Cloud Run CUD)
Pilot (5)	~$240	~$0–20 (only 3rd-party escalation)	~$210
Growth (50)	~$1,280	~$30–60	~$1,030
Scale (200)	~$4,480	~$60–120	~$3,540

Order of magnitude of the levers: startup credits ≫ context caching > batch ≈ Cloud Run CUD ≫ BigQuery/PT commitments (negative here). Keep the design Gemini-first (already the policy) to maximize credit coverage, since third-party models are the only uncovered slice.

10. Bottom line

Per active client: ~$8–42/mo (profile-dependent, expected) all-in infra+LLM; ~$19–28 blended at scale.
Portfolio: ~$240/mo (pilot) → ~$1.1–1.3K/mo (50) → ~$3.7–4.5K/mo (200), excluding optional relay.
Fixed baseline ~$220/mo (trimmed warm services, §4.1) dominates until ~12–18 tenants.
Build — assumed go-live DDL (core, solo sprint): ~13 weeks calendar; G+M+T ~W9. Full-module accounting: ~50 engineer-weeks core (~67–69 with DV360 + relay) — capacity planning only, not MVP calendar.
The dominant cash item is media float (working capital), not infra — plan it separately.
All platform ad APIs are free; cost risk is concentrated in LLM optimization frequency — directly governed by Cost Guard + model routing already in the design.
GCP commitment/credits (§9): the Google for Startups Cloud Program (up to $350K / 2 yrs) effectively zeros the GCP+Gemini bill in Year 1 (only ~2% third-party escalation billed). Cloud Run CUDs add ~17–46% on the small compute slice afterward. Do not commit BigQuery slots or Vertex Provisioned Throughput at this scale — on-demand is cheaper.

0. Scope — what is and isn't counted

1. Unit prices (verified — source table)

2. Recurring cost by layer

2.1 LLM inference (the dominant variable cost)

2.2 Vertex Agent Engine runtime (agent compute, separate from tokens)

2.3 Cloud Run (orchestrator control plane, connectors, BFFs, Cost Guard, jobs)

2.4 Data warehouse (BigQuery: Google Ads export + telemetry)

2.5 Other GCP (Pub/Sub, Secret Manager, GCS, Logging, egress) — per tenant

2.6 First-party relay (Phase 2+, optional SKU)

2.7 Platform API costs

3. Per-tenant total cost of ownership (blended, excl. relay & media)

4. Fixed baseline infra (independent of tenant count)

4.1 Cutting the warm-service baseline

5. Portfolio totals (all-in monthly, expected, excl. relay & media)

6. One-time build cost (engineer-weeks)

7. Media float / working capital (not opex)

8. Sensitivity & levers

9. Commitment & credit savings (GCP)

9.1 Optional — Google for Startups Cloud Program (credits)

9.2 Committed Use Discounts (CUDs) — steady-state, after credits

9.3 Where commitment does NOT pay at this scale (avoid the trap)

9.4 Stacked effect (illustrative)

10. Bottom line

Related documents