Cloud Run Deployment

Reading Progress74%

Appendix C: Cloud Run Deployment

Env templates: backend/.env.example, frontend/.env.example, marketing/.env.example.

This appendix is the complete setup checklist for deploying the DEML platform on Cloud Run (project: deml). Every hostname, broker address, and cross-site URL is env-driven — never hardcode domains in application code.

Pre-Deploy Checklist

Before creating services, prepare:

  1. Secrets in Cloud Run Variables or Infisical (recommended for SOC 2 / CMMC): SECRET_KEY, FIREBASE_SERVICE_ACCOUNT_JSON, GCP_SERVICE_ACCOUNT_JSON, Stripe, Resend, threat-intel API keys, HF_TOKEN, SENTRY_DSN.
  2. Cross-site URL trio (same names on backend, frontend, and marketing builds):
Variable Production value Purpose
FRONTEND_URL https://deml.app Angular app, widgets, status
BACKEND_URL https://backend.deml.app Django API, OAuth callbacks
MARKETING_URL https://dataengineeringformachinelearning.com Astro site, auth handoff, CORS
  1. CORS / CSRF must list every public origin (app + marketing + backend + local dev). Copy from backend/.env.example and extend for your domains.
  2. Production guards: DEBUG=False, unique SECRET_KEY. On Cloud Run, the backend fails fast if these are insecure (backend/utils/env.py).
  3. Privacy defaults: SENTRY_SEND_PII=false, STRUCTURED_LOGS=true (JSON logs with correlation IDs).

How to Deploy in One Project

  1. Create a New Project: GCP dashboard → New ProjectEmpty Project (name: deml).
  2. Add Postgres: NewDatabasePostgreSQL. Note the internal DATABASE_URL.
  3. Add Services: For each component below, New ServiceGitHub Repo → select this repository.
  4. Configure each service (Settings tab):
    • Root Directory as specified below.
    • Start Command when overridden.
    • Watch Paths (e.g. /frontend/**, /backend/**) so unrelated changes do not trigger rebuilds.
  5. Variables tab: Set env vars per service (see per-service sections). Workers share nearly the same bundle as deml-backend.
  6. Redeploy frontend after changing build-time vars (FRONTEND_URL, BACKEND_URL, MARKETING_URL, FIREBASE_*, SENTRY_DSN) so set-env.js regenerates environment.ts.
  7. Marketing site (Astro) is hosted outside this Cloud Run project (Firebase Hosting). Set the same URL trio at build time on that host.
  8. Firebase Cloud Functions + Firestore rules deploy via GitHub Actions (.github/workflows/firebase-backend-deploy.yml), not Cloud Run.

Infisical Integration

To satisfy strict secret management guidelines (SOC 2, CMMC 2.0, NIST SP 800-171 Rev. 3 CC6.1/CC6.2), all secret keys, passwords, and API credentials are kept out of raw service settings and stored inside Infisical.

  1. Set up an Infisical organization and create a project for dataengineeringformachinelearning.
  2. Connect your Cloud Run services to Infisical via the official Cloud Run Infisical Integration.
  3. For local development, run tasks using the Infisical CLI:
    infisical run -- python manage.py runserver
    

Services Overview

1. Web Frontend (deml-frontend)

Angular SPA — dashboard, status pages, widgets.

  • Root Directory: /frontend
  • Builder: Dockerfile (gcr.io/distroless/nodejs22-debian12 runtime; multi-stage from node:22-alpine)
  • Start Command: node dist/frontend/server/server.mjs (Angular SSR; Dockerfile default)
  • Public URL: https://deml.app
  • Private Internal DNS: deml-frontend.internal
  • Build step: set-env.js runs at deploy and writes src/environments/environment.ts.

Required build-time variables (see frontend/.env.example):

Variable Example Notes
FRONTEND_URL https://deml.app Widget + status links
BACKEND_URL https://backend.deml.app API base
MARKETING_URL https://dataengineeringformachinelearning.com Auth handoff
FIREBASE_API_KEY (secret) Web auth
FIREBASE_PROJECT_ID demldotcom
FIREBASE_APP_ID (from Firebase console)
FIREBASE_AUTH_DOMAIN demldotcom.firebaseapp.com
FIREBASE_STORAGE_BUCKET demldotcom.firebasestorage.app
FIREBASE_MESSAGING_SENDER_ID (from Firebase console)
SANITY_PROJECT_ID hj5wtuct CMS content
SANITY_DATASET production
SENTRY_DSN (optional) Client error reporting; omit to disable

2. Web Backend (deml-backend)

Django + Ninja API — auth, outbox writes, billing, monitor.

  • Root Directory: /backend
  • Builder: Dockerfile (gcr.io/distroless/python3-debian12)
  • Start Command: /opt/venv/bin/python start.py
  • Public URL: https://backend.deml.app
  • Private Internal DNS: deml-backend.internal

Required variables (see backend/.env.example):

Category Variables
Core SECRET_KEY, DEBUG=False, ALLOWED_HOSTS, DATABASE_URL
Cross-site URLs FRONTEND_URL, BACKEND_URL, MARKETING_URL
CORS / CSRF CORS_ALLOWED_ORIGINS, CSRF_TRUSTED_ORIGINS, CORS_ALLOW_CREDENTIALS=True
Event bus REDPANDA_BROKERS=deml-queue.internal:9092, DRAGONFLY_HOST=deml-dragonfly.internal
Firebase FIREBASE_SERVICE_ACCOUNT_JSON, FIREBASE_PROJECT_ID, GOOGLE_CLOUD_PROJECT
OAuth / AI GOOGLE_API_KEY, GOOGLE_OAUTH_CLIENT_ID, GOOGLE_OAUTH_CLIENT_SECRET, GOOGLE_OAUTH_REDIRECT_URI
Threat intel ABUSEIPDB_API_KEY, IPINFO_API_KEY, OTX_API_KEY, ISAC_API_KEY, CISA_TAXII_ENDPOINT
Email / alerts RESEND_API_KEY, ALERT_EMAIL_TARGET, ALERT_EMAIL_FROM, DISCORD_WEBHOOK_URL
Observability SENTRY_DSN, SENTRY_SEND_PII=false, STRUCTURED_LOGS=true, GCP_LOGGING_ENABLED
Billing STRIPE_PUBLIC_KEY, STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET
ML / encryption HF_TOKEN, HF_REPO_ID, GCP_KMS_*, GCP_SERVICE_ACCOUNT_JSON
CVE pipeline SCANNER_SERVICE_URL, CPE_GUESSER_URL, CLICKHOUSE_URI, CVE_DICT_DB_URL
Dark web OSINT TOR_PROXY_URL=socks5h://deml-tor-proxy.internal:9050

CORS example (production):

CORS_ALLOWED_ORIGINS=https://deml.app,https://dataengineeringformachinelearning.com,https://backend.deml.app,https://backend.dataengineeringformachinelearning.com
CSRF_TRUSTED_ORIGINS=https://deml.app,https://dataengineeringformachinelearning.com,https://backend.deml.app,https://backend.dataengineeringformachinelearning.com

3. Redpanda Broker (Message Queue)

This is the actual Redpanda message broker database that stores the streaming data.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/queue
  • Builder: Dockerfile
  • Start Command: Uses default Docker entrypoint
  • Target Port: 9092 (Kafka API)
  • Private Internal DNS: deml-queue.internal:9092
  • Public URL: None (Strictly internal for security)
  • Compute Limits: 24 vCPU / 24 GB Memory
  • Persistent Storage: You MUST attach a Cloud Run Persistent Volume mounted to /var/lib/redpanda/data. Without it, Redpanda runs on the container's ephemeral disk and loses all topics, messages, and consumer offsets on every restart/redeploy. This silently breaks Event Projections (events produced just before a restart never reach the worker, and the in-app verification times out). Add it in Cloud Run → deml-queue → Variables/Settings → Volumes (the gcloud compute disks create CLI currently panics with a project token, so use the dashboard).
  • Deployment Trigger: Scoped via infrastructure/queue/cloudbuild.yaml build.watchPatterns to only redeploy when infrastructure/queue/** changes — so unrelated merges to main don't restart (and, until a volume is attached, wipe) the broker.
  • Environment Variables:
    • REDPANDA_BROKERS: Not strictly needed, but ensure port 9092 is exposed internally.

Public Authenticated Redpanda Listener (for Firebase Cloud Functions)

To achieve the fastest client command path (Angular → ingestEvent → direct to Redpanda → worker consume → Firestore projection with no polling), the queue exposes a second listener:

  • Internal (9092, PLAINTEXT): used by all Cloud Run services (backend, workers, outbox_relay).
  • External (9093, SASL + SCRAM-SHA-256 over plain TCP): used only by Firebase Cloud Functions.

Critical: the public endpoint must be a Cloud Run TCP Proxy (raw TCP), not an HTTP custom domain (e.g. queue.deml.app). An HTTP/HTTPS domain terminates TLS and speaks HTTP — it cannot carry the raw Kafka protocol, so the function connection is reset and ingestEvent silently falls back to the Firestore inbox (slow polled projection instead of the fast path). This was the original cause of the public path never working.

Setup on the deml-queue service (production):

  1. In Cloud Run → deml-queue → Settings → Networking, add a TCP Proxy targeting container port 9093. Cloud Run returns an address like xxxx.proxy.rlwy.net:NNNNN (the current production proxy is zephyr.proxy.rlwy.net:32253).
    • CLI equivalent: gcloud compute target-tcp-proxies create --port 9093 --service deml-queue
  2. Set service variables so the broker advertises that reachable address:
    • PUBLIC_REDPANDA_HOST=xxxx.proxy.rlwy.net (e.g. zephyr.proxy.rlwy.net)
    • PUBLIC_REDPANDA_PORT=NNNNN (the proxy's external port, e.g. 20635; the container keeps listening on 9093, which is the proxy target)
    • REDPANDA_SASL_USERNAME=admin (or a dedicated user)
    • REDPANDA_SASL_PASSWORD=...

The entrypoint (infrastructure/queue/entrypoint.sh) handles dual listeners, advertises PUBLIC_REDPANDA_HOST:PUBLIC_REDPANDA_PORT on the external listener, and auto-creates the SASL user.

On the Firebase side, ingestEvent is a 2nd-gen (Cloud Run) function, so its config must be provided as environment variables (process.env) — the legacy firebase functions:config:set does not apply to v2 functions. The deploy workflow (.github/workflows/firebase-backend-deploy.yml) writes a functions/.env from these GitHub repository secrets before firebase deploy:

  • REDPANDA_PUBLIC_BROKERSREDPANDA_BROKERS, e.g. zephyr.proxy.rlwy.net:32253
  • REDPANDA_PUBLIC_SASL_USERNAMEREDPANDA_SASL_USERNAME, e.g. admin
  • REDPANDA_PUBLIC_SASL_PASSWORDREDPANDA_SASL_PASSWORD (same value as the deml-queue service)
  • REDPANDA_PUBLIC_SSL (optional) → REDPANDA_SSL; leave unset/false for a Cloud Run TCP Proxy (plain TCP). Only true if TLS is terminated at the edge (e.g. Cloudflare Spectrum).

Set those secrets, then re-run the "Deploy Firebase Backend" workflow. (For stricter secret handling you may instead bind REDPANDA_SASL_PASSWORD via Google Secret Manager / firebase functions:secrets:set and defineSecret in code.)

If the public path is unavailable the system still works via the resilient fallback: ingestEvent writes to the Firestore frontend_command_inbox and the telemetry worker's poll_firestore_inbox task (every ~10s) projects it — slower, but the verification still passes.

4. Telemetry Worker (deml-telemetry-worker)

Consumes Redpanda topics (app-events, frontend-events, user-issues), projects to Postgres + Firestore deml DB, runs health pings and analytics rollups.

  • Root Directory: /backend
  • Start Command: python manage.py telemetry_worker
  • Private Internal DNS: deml-telemetry-worker.internal
  • Public URL: None (internal only)

4b. Background Daemon (deml-daemon)

The background daemon is a high-performance compiled Rust service that manages the outbox relay, health pinger cycles, and cron scheduler tasks. It connects directly to the PostgreSQL database and Redpanda event queue.

  • Root Directory: /rust
  • Docker File: deml-daemon/Dockerfile
  • Start Command: Runs the compiled binary natively (configured as ENTRYPOINT)
  • Private Internal DNS: deml-daemon.internal
  • Public URL: None (internal only)

Variables:

  • DATABASE_URL (standard Postgres / Neon connection string)
  • REDPANDA_BROKERS (e.g. deml-queue.internal:9092)
  • BACKEND_INTERNAL_URL (points to the backend container, e.g. http://deml-backend.internal:8080)
  • INTERNAL_SECRET (matching backend's validation header)
  • BATCH_SIZE=100, POLL_INTERVAL_SECS=5, PINGER_INTERVAL_SECS=30

5. Consolidated Background Workers (deml-workers)

Consolidates the machine learning and security execution logic into a single Python container. It spawns the ML thread, the security execution thread, and the internal-tasks Redpanda consumer concurrently.

  • Root Directory: /backend
  • Start Command: python deml_workers_start.py
  • Private Internal DNS: deml-workers.internal

Variables:

  • Core backend bundle (DATABASE_URL, SECRET_KEY, DEBUG=False)
  • REDPANDA_BROKERS, DRAGONFLY_HOST
  • HF_TOKEN, HF_REPO_ID
  • Threat intelligence API keys, GCP_KMS_* credentials, and TOR_PROXY_URL (Socks5 Tor bridge)

Shared Worker Environment Bundle

All background tasks (deml-telemetry-worker, deml-workers) inherit environment configurations from the core deml-backend settings. Use Cloud Run shared variables or Infisical to avoid drift. Centralized reads go through backend/utils/env.py.

Event flow (for operators):

  1. API writes → OutboxEvent (Postgres) → outbox_relay → Redpanda
  2. Angular/Firebase → frontend-events topic → telemetry_worker → Postgres + Firestore
  3. Idempotency keys + DLQ handled inside telemetry_worker projectors

Firebase (separate from Cloud Run): Cloud Functions (ingestEvent) and Firestore rules deploy via .github/workflows/firebase-backend-deploy.yml using FIREBASE_SERVICE_ACCOUNT_DEMLDOTCOM. For the fastest path (direct publish, no polling) give the Functions a public SASL-authenticated Redpanda listener (port 9093, SCRAM-SHA-256). See infrastructure/queue/, the updated REDPANDA_* guidance, and functions/src/index.ts. The inbox fallback remains only as defence-in-depth.

Marketing Site (not a Cloud Run service)

Hosted separately (Firebase Hosting / static). Build with the same URL trio:

Variable Example
FRONTEND_URL https://deml.app
BACKEND_URL https://backend.deml.app
MARKETING_URL https://dataengineeringformachinelearning.com

See marketing/.env.example. Legacy PUBLIC_MAIN_APP_URL / PUBLIC_API_BASE still work but are deprecated.

7. ClickHouse Database (Telemetry Storage)

ClickHouse is used to securely store all high-volume OpenTelemetry data from the widget and backend services.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/clickhouse
  • Builder: Dockerfile (utilizes clickhouse/clickhouse-server:24.3)
  • Start Command: Uses default Docker entrypoint
  • Target Port: 8123 (HTTP) and 9000 (Native)
  • Private Internal DNS: deml-clickhouse.internal
  • Public URL: None (Strictly an internal database)
  • Compute Limits: 24 vCPU / 24 GB Memory
  • Persistent Storage: You MUST attach a Cloud Run Persistent Volume to /var/lib/clickhouse.
  • Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
  • Environment Variables:
    • CLICKHOUSE_USER: Leave this variable completely unset/deleted on the ClickHouse service if you want to use the default user. If you define it as default explicitly, ClickHouse's entrypoint will skip setting the password, causing connection errors in other services.
    • CLICKHOUSE_PASSWORD: Set a secure password (e.g. for the default user).
    • CLICKHOUSE_DB: otel

[!IMPORTANT] ClickHouse Password Gotcha: Do not define CLICKHOUSE_USER as default in the ClickHouse service environment variables. If you wish to use the default user, simply omit the CLICKHOUSE_USER variable entirely from the ClickHouse service. The entrypoint script will automatically apply your CLICKHOUSE_PASSWORD to the default user. Make sure CLICKHOUSE_USER is still set to default in your otel-collector and backend services so they connect correctly.

8. OpenTelemetry Collector (Router)

The OpenTelemetry Collector receives all spans and metrics from the frontend widget and backend, processing them securely before batch-inserting into ClickHouse.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/otel-collector
  • Builder: Dockerfile (utilizes secure otel/opentelemetry-collector-contrib distroless base)
  • Start Command: Uses default Docker entrypoint
  • Target Port: 4318 (OTLP HTTP)
  • Private Internal DNS: deml-telemetry-collector.internal
  • Public URL: https://telemetry.deml.app
  • Compute Limits: 24 vCPU / 24 GB Memory
  • Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
  • Environment Variables:
    • CLICKHOUSE_HOST: The internal TCP host of your ClickHouse service (e.g. deml-clickhouse.internal).
    • CLICKHOUSE_USER: Must match what you set in the ClickHouse service.
    • CLICKHOUSE_PASSWORD: Must match what you set in the ClickHouse service.

9. Vulnerability Scanner Engine

This microservice provides an offline, isolated environment for executing osv-scanner and cpe-guesser to enrich telemetry without bloating the main backend image.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/scanner
  • Builder: Dockerfile (utilizes python:3.11-slim with the official Google osv-scanner binary)
  • Start Command: uvicorn main:app --host 0.0.0.0 --port 8000 (Default in Dockerfile)
  • Target Port: 8000 (FastAPI)
  • Private Internal DNS: deml-scanner.internal:8000
  • Public URL: None (Strictly an internal service)
  • Compute Limits: 24 vCPU / 24 GB Memory
  • Persistent Storage: You MUST attach a Cloud Run Persistent Volume to /data/osv so the OSV database dump does not have to be repeatedly downloaded.
  • Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
  • Environment Variables:
    • OSV_DB_PATH: /data/osv (The mounted volume path)
    • CPE_GUESSER_URL: http://deml-cpe-guesser.internal:1323/unique
    • NVD_API_KEY: Your National Vulnerability Database API Key (optional but highly recommended to bypass rate limits)

Consumers (deml-backend, workers) set SCANNER_SERVICE_URL=http://deml-scanner.internal:8000.

10. CPE Guesser Service

This service converts raw technology strings into CPE 2.3 identifiers. It is required for the Vulnerability Scanner Engine to properly normalize infrastructure data.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/cpe-guesser
  • Builder: Dockerfile (Builds from source using Python 3.11 with an internal Valkey/Redis cache)
  • Start Command: /app/start.sh (Default in Dockerfile)
  • Target Port: 1323
  • Private Internal DNS: deml-cpe-guesser.internal
  • Public URL: None (Strictly an internal service)
  • Compute Limits: 1 vCPU / 1 GB Memory
  • Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
  • Environment Variables: None required by default.

(Once deployed, ensure the CPE_GUESSER_URL environment variable on the Vulnerability Scanner Engine points to this internal DNS, e.g., http://deml-cpe-guesser.internal:1323/unique)

11. Tor Proxy (Dark Web Scanner)

A lightweight proxy that allows the security worker to anonymously scrape dark web search engines (e.g., Ahmia) for brand mentions.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/tor-proxy
  • Builder: Dockerfile (Minimal alpine image running as non-root tor user)
  • Target Port: 9050
  • Private Internal DNS: deml-tor-proxy.internal
  • Environment Variables: None on the proxy itself.

Consumers (deml-backend, deml-security-worker) must set:

TOR_PROXY_URL=socks5h://deml-tor-proxy.internal:9050

13. Dragonfly (Redis Replacement for WebSockets)

This service provides the in-memory pub/sub message broker required by Django Channels to route real-time WebSocket traffic. We use a custom, highly secure distroless image to minimize the attack surface.

  • Source: GitHub repository (main branch)
  • Root Directory: /infrastructure/dragonfly
  • Builder: Dockerfile (Multi-stage build using Google Distroless cc-debian12:nonroot)
  • Target Port: 6379
  • Private Internal DNS: deml-dragonfly.internal
  • Public URL: None (Strictly an internal service)
  • Environment Variables: None required by default.

(Once deployed, set DRAGONFLY_HOST=deml-dragonfly.internal on deml-backend, all workers, and any service using Channels or rate limiting.)

Internal Networking

All inter-service traffic uses Cloud Run private DNS (*.internal). Never route broker, database, or cache traffic over public URLs.

Service Internal address
Backend API deml-backend.internal:8080
Frontend deml-frontend.internal:8080
Postgres Via DATABASE_URL (internal connection string from Cloud Run)
Redpanda deml-queue.internal:9092
Dragonfly deml-dragonfly.internal:6379
ClickHouse deml-clickhouse.internal:8123
Scanner deml-scanner.internal:8000
CPE Guesser deml-cpe-guesser.internal:1323
Tor proxy deml-tor-proxy.internal:9050

Local Development (docker-compose.yml)

Local parity includes: backend, telemetry_worker, ml_worker, security_worker, outbox_relay, Postgres, Redpanda, Dragonfly, ClickHouse, Tor proxy, and supporting infra. Copy backend/.env.example to backend/.env and use localhost overrides:

REDPANDA_BROKERS=redpanda:9092
DRAGONFLY_HOST=dragonfly
FRONTEND_URL=http://localhost:4200
BACKEND_URL=http://localhost:8000
MARKETING_URL=http://localhost:4321
TOR_PROXY_URL=socks5h://tor-proxy:9050

Run docker compose up from the repo root. Frontend: cd frontend && npm start. Marketing: cd marketing && npm run dev.

Updating Environment Variables

  1. Prefer setting in GCP dashboard (Variables tab per service) or via CLI.
  2. After changing build-time vars (MARKETING_URL, BACKEND_URL, FIREBASE_*) for frontend, trigger a new deploy so set-env.js runs.
  3. Keep backend/.env.example, frontend/.env.example, and marketing/.env.example in sync with reality.

Security Notes

  • Never commit real .env.
  • Secrets (Stripe, Resend, Firebase SA, KMS, HF) should use Cloud Run secret variables or Infisical integration.
  • CORS/CSRF lists are the primary control for cross-origin auth handoff.

See also: backend/.env.example, frontend/.env.example, marketing/.env.example, BOOK.md (Event Projections chapter), and AGENTS.md (CORS rule: never hardcode domains).

Cloud Run CLI Quick Reference

gcloud config set project
gcloud run services update deml-backend --set "MARKETING_URL=https://dataengineeringformachinelearning.com"
gcloud run services update deml-frontend --set "SENTRY_DSN=<your-dsn>"
gcloud run services update deml-backend

After any build-time variable change on deml-frontend, trigger a redeploy.

CI/CD Pipeline

  • All services are linked to the main branch of the dataengineeringformachinelearning repository.
  • Pushes to the main branch will automatically trigger new builds and deployments for the affected services.
  • Automated security testing via Socket.dev and Checkov pre-commit hooks runs on every push.
  • Watch Paths: You can set gitignore-style rules (e.g., /frontend/** or /backend/**) in the Cloud Run settings to ensure that a service only rebuilds when its specific directory changes.

Reliability and Scaling

  • Restart Policy: All services are configured to restart "On Failure" with a maximum of 10 retries, ensuring automatic recovery from temporary crashes.
  • Region: US East (Virginia, USA)
  • Replicas: 1 replica per service.