Cloud Run Deployment
Appendix C: Cloud Run Deployment
Env templates: backend/.env.example, frontend/.env.example, marketing/.env.example.
This appendix is the complete setup checklist for deploying the DEML platform on Cloud Run (project: deml). Every hostname, broker address, and cross-site URL is env-driven — never hardcode domains in application code.
Pre-Deploy Checklist
Before creating services, prepare:
- Secrets in Cloud Run Variables or Infisical (recommended for SOC 2 / CMMC):
SECRET_KEY,FIREBASE_SERVICE_ACCOUNT_JSON,GCP_SERVICE_ACCOUNT_JSON, Stripe, Resend, threat-intel API keys,HF_TOKEN,SENTRY_DSN. - Cross-site URL trio (same names on backend, frontend, and marketing builds):
| Variable | Production value | Purpose |
|---|---|---|
FRONTEND_URL |
https://deml.app |
Angular app, widgets, status |
BACKEND_URL |
https://backend.deml.app |
Django API, OAuth callbacks |
MARKETING_URL |
https://dataengineeringformachinelearning.com |
Astro site, auth handoff, CORS |
- CORS / CSRF must list every public origin (app + marketing + backend + local dev). Copy from
backend/.env.exampleand extend for your domains. - Production guards:
DEBUG=False, uniqueSECRET_KEY. On Cloud Run, the backend fails fast if these are insecure (backend/utils/env.py). - Privacy defaults:
SENTRY_SEND_PII=false,STRUCTURED_LOGS=true(JSON logs with correlation IDs).
How to Deploy in One Project
- Create a New Project: GCP dashboard → New Project → Empty Project (name:
deml). - Add Postgres: New → Database → PostgreSQL. Note the internal
DATABASE_URL. - Add Services: For each component below, New Service → GitHub Repo → select this repository.
- Configure each service (Settings tab):
- Root Directory as specified below.
- Start Command when overridden.
- Watch Paths (e.g.
/frontend/**,/backend/**) so unrelated changes do not trigger rebuilds.
- Variables tab: Set env vars per service (see per-service sections). Workers share nearly the same bundle as
deml-backend. - Redeploy frontend after changing build-time vars (
FRONTEND_URL,BACKEND_URL,MARKETING_URL,FIREBASE_*,SENTRY_DSN) soset-env.jsregeneratesenvironment.ts. - Marketing site (Astro) is hosted outside this Cloud Run project (Firebase Hosting). Set the same URL trio at build time on that host.
- Firebase Cloud Functions + Firestore rules deploy via GitHub Actions (
.github/workflows/firebase-backend-deploy.yml), not Cloud Run.
Infisical Integration
To satisfy strict secret management guidelines (SOC 2, CMMC 2.0, NIST SP 800-171 Rev. 3 CC6.1/CC6.2), all secret keys, passwords, and API credentials are kept out of raw service settings and stored inside Infisical.
- Set up an Infisical organization and create a project for
dataengineeringformachinelearning. - Connect your Cloud Run services to Infisical via the official Cloud Run Infisical Integration.
- For local development, run tasks using the Infisical CLI:
infisical run -- python manage.py runserver
Services Overview
1. Web Frontend (deml-frontend)
Angular SPA — dashboard, status pages, widgets.
- Root Directory:
/frontend - Builder: Dockerfile (
gcr.io/distroless/nodejs22-debian12runtime; multi-stage fromnode:22-alpine) - Start Command:
node dist/frontend/server/server.mjs(Angular SSR; Dockerfile default) - Public URL:
https://deml.app - Private Internal DNS:
deml-frontend.internal - Build step:
set-env.jsruns at deploy and writessrc/environments/environment.ts.
Required build-time variables (see frontend/.env.example):
| Variable | Example | Notes |
|---|---|---|
FRONTEND_URL |
https://deml.app |
Widget + status links |
BACKEND_URL |
https://backend.deml.app |
API base |
MARKETING_URL |
https://dataengineeringformachinelearning.com |
Auth handoff |
FIREBASE_API_KEY |
(secret) | Web auth |
FIREBASE_PROJECT_ID |
demldotcom |
|
FIREBASE_APP_ID |
(from Firebase console) | |
FIREBASE_AUTH_DOMAIN |
demldotcom.firebaseapp.com |
|
FIREBASE_STORAGE_BUCKET |
demldotcom.firebasestorage.app |
|
FIREBASE_MESSAGING_SENDER_ID |
(from Firebase console) | |
SANITY_PROJECT_ID |
hj5wtuct |
CMS content |
SANITY_DATASET |
production |
|
SENTRY_DSN |
(optional) | Client error reporting; omit to disable |
2. Web Backend (deml-backend)
Django + Ninja API — auth, outbox writes, billing, monitor.
- Root Directory:
/backend - Builder: Dockerfile (
gcr.io/distroless/python3-debian12) - Start Command:
/opt/venv/bin/python start.py - Public URL:
https://backend.deml.app - Private Internal DNS:
deml-backend.internal
Required variables (see backend/.env.example):
| Category | Variables |
|---|---|
| Core | SECRET_KEY, DEBUG=False, ALLOWED_HOSTS, DATABASE_URL |
| Cross-site URLs | FRONTEND_URL, BACKEND_URL, MARKETING_URL |
| CORS / CSRF | CORS_ALLOWED_ORIGINS, CSRF_TRUSTED_ORIGINS, CORS_ALLOW_CREDENTIALS=True |
| Event bus | REDPANDA_BROKERS=deml-queue.internal:9092, DRAGONFLY_HOST=deml-dragonfly.internal |
| Firebase | FIREBASE_SERVICE_ACCOUNT_JSON, FIREBASE_PROJECT_ID, GOOGLE_CLOUD_PROJECT |
| OAuth / AI | GOOGLE_API_KEY, GOOGLE_OAUTH_CLIENT_ID, GOOGLE_OAUTH_CLIENT_SECRET, GOOGLE_OAUTH_REDIRECT_URI |
| Threat intel | ABUSEIPDB_API_KEY, IPINFO_API_KEY, OTX_API_KEY, ISAC_API_KEY, CISA_TAXII_ENDPOINT |
| Email / alerts | RESEND_API_KEY, ALERT_EMAIL_TARGET, ALERT_EMAIL_FROM, DISCORD_WEBHOOK_URL |
| Observability | SENTRY_DSN, SENTRY_SEND_PII=false, STRUCTURED_LOGS=true, GCP_LOGGING_ENABLED |
| Billing | STRIPE_PUBLIC_KEY, STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET |
| ML / encryption | HF_TOKEN, HF_REPO_ID, GCP_KMS_*, GCP_SERVICE_ACCOUNT_JSON |
| CVE pipeline | SCANNER_SERVICE_URL, CPE_GUESSER_URL, CLICKHOUSE_URI, CVE_DICT_DB_URL |
| Dark web OSINT | TOR_PROXY_URL=socks5h://deml-tor-proxy.internal:9050 |
CORS example (production):
CORS_ALLOWED_ORIGINS=https://deml.app,https://dataengineeringformachinelearning.com,https://backend.deml.app,https://backend.dataengineeringformachinelearning.com
CSRF_TRUSTED_ORIGINS=https://deml.app,https://dataengineeringformachinelearning.com,https://backend.deml.app,https://backend.dataengineeringformachinelearning.com
3. Redpanda Broker (Message Queue)
This is the actual Redpanda message broker database that stores the streaming data.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/queue - Builder: Dockerfile
- Start Command: Uses default Docker entrypoint
- Target Port:
9092(Kafka API) - Private Internal DNS:
deml-queue.internal:9092 - Public URL: None (Strictly internal for security)
- Compute Limits: 24 vCPU / 24 GB Memory
- Persistent Storage: You MUST attach a Cloud Run Persistent Volume mounted to
/var/lib/redpanda/data. Without it, Redpanda runs on the container's ephemeral disk and loses all topics, messages, and consumer offsets on every restart/redeploy. This silently breaks Event Projections (events produced just before a restart never reach the worker, and the in-app verification times out). Add it in Cloud Run →deml-queue→ Variables/Settings → Volumes (thegcloud compute disks createCLI currently panics with a project token, so use the dashboard). - Deployment Trigger: Scoped via
infrastructure/queue/cloudbuild.yamlbuild.watchPatternsto only redeploy wheninfrastructure/queue/**changes — so unrelated merges tomaindon't restart (and, until a volume is attached, wipe) the broker. - Environment Variables:
- REDPANDA_BROKERS: Not strictly needed, but ensure port
9092is exposed internally.
- REDPANDA_BROKERS: Not strictly needed, but ensure port
Public Authenticated Redpanda Listener (for Firebase Cloud Functions)
To achieve the fastest client command path (Angular → ingestEvent → direct to Redpanda → worker consume → Firestore projection with no polling), the queue exposes a second listener:
- Internal (9092, PLAINTEXT): used by all Cloud Run services (backend, workers, outbox_relay).
- External (9093, SASL + SCRAM-SHA-256 over plain TCP): used only by Firebase Cloud Functions.
Critical: the public endpoint must be a Cloud Run TCP Proxy (raw TCP), not an HTTP custom domain (e.g.
queue.deml.app). An HTTP/HTTPS domain terminates TLS and speaks HTTP — it cannot carry the raw Kafka protocol, so the function connection is reset andingestEventsilently falls back to the Firestore inbox (slow polled projection instead of the fast path). This was the original cause of the public path never working.
Setup on the deml-queue service (production):
- In Cloud Run →
deml-queue→ Settings → Networking, add a TCP Proxy targeting container port 9093. Cloud Run returns an address likexxxx.proxy.rlwy.net:NNNNN(the current production proxy iszephyr.proxy.rlwy.net:32253).- CLI equivalent:
gcloud compute target-tcp-proxies create --port 9093 --service deml-queue
- CLI equivalent:
- Set service variables so the broker advertises that reachable address:
PUBLIC_REDPANDA_HOST=xxxx.proxy.rlwy.net(e.g.zephyr.proxy.rlwy.net)PUBLIC_REDPANDA_PORT=NNNNN(the proxy's external port, e.g.20635; the container keeps listening on 9093, which is the proxy target)REDPANDA_SASL_USERNAME=admin(or a dedicated user)REDPANDA_SASL_PASSWORD=...
The entrypoint (infrastructure/queue/entrypoint.sh) handles dual listeners, advertises
PUBLIC_REDPANDA_HOST:PUBLIC_REDPANDA_PORT on the external listener, and auto-creates the
SASL user.
On the Firebase side, ingestEvent is a 2nd-gen (Cloud Run) function, so its config
must be provided as environment variables (process.env) — the legacy
firebase functions:config:set does not apply to v2 functions. The deploy workflow
(.github/workflows/firebase-backend-deploy.yml) writes a functions/.env from these
GitHub repository secrets before firebase deploy:
REDPANDA_PUBLIC_BROKERS→REDPANDA_BROKERS, e.g.zephyr.proxy.rlwy.net:32253REDPANDA_PUBLIC_SASL_USERNAME→REDPANDA_SASL_USERNAME, e.g.adminREDPANDA_PUBLIC_SASL_PASSWORD→REDPANDA_SASL_PASSWORD(same value as thedeml-queueservice)REDPANDA_PUBLIC_SSL(optional) →REDPANDA_SSL; leave unset/false for a Cloud Run TCP Proxy (plain TCP). Onlytrueif TLS is terminated at the edge (e.g. Cloudflare Spectrum).
Set those secrets, then re-run the "Deploy Firebase Backend" workflow. (For stricter
secret handling you may instead bind REDPANDA_SASL_PASSWORD via Google Secret Manager /
firebase functions:secrets:set and defineSecret in code.)
If the public path is unavailable the system still works via the resilient fallback:
ingestEvent writes to the Firestore frontend_command_inbox and the telemetry worker's
poll_firestore_inbox task (every ~10s) projects it — slower, but the verification still
passes.
4. Telemetry Worker (deml-telemetry-worker)
Consumes Redpanda topics (app-events, frontend-events, user-issues), projects to Postgres + Firestore deml DB, runs health pings and analytics rollups.
- Root Directory:
/backend - Start Command:
python manage.py telemetry_worker - Private Internal DNS:
deml-telemetry-worker.internal - Public URL: None (internal only)
4b. Background Daemon (deml-daemon)
The background daemon is a high-performance compiled Rust service that manages the outbox relay, health pinger cycles, and cron scheduler tasks. It connects directly to the PostgreSQL database and Redpanda event queue.
- Root Directory:
/rust - Docker File:
deml-daemon/Dockerfile - Start Command: Runs the compiled binary natively (configured as ENTRYPOINT)
- Private Internal DNS:
deml-daemon.internal - Public URL: None (internal only)
Variables:
DATABASE_URL(standard Postgres / Neon connection string)REDPANDA_BROKERS(e.g.deml-queue.internal:9092)BACKEND_INTERNAL_URL(points to the backend container, e.g.http://deml-backend.internal:8080)INTERNAL_SECRET(matching backend's validation header)BATCH_SIZE=100,POLL_INTERVAL_SECS=5,PINGER_INTERVAL_SECS=30
5. Consolidated Background Workers (deml-workers)
Consolidates the machine learning and security execution logic into a single Python container. It spawns the ML thread, the security execution thread, and the internal-tasks Redpanda consumer concurrently.
- Root Directory:
/backend - Start Command:
python deml_workers_start.py - Private Internal DNS:
deml-workers.internal
Variables:
- Core backend bundle (
DATABASE_URL,SECRET_KEY,DEBUG=False) REDPANDA_BROKERS,DRAGONFLY_HOSTHF_TOKEN,HF_REPO_ID- Threat intelligence API keys,
GCP_KMS_*credentials, andTOR_PROXY_URL(Socks5 Tor bridge)
Shared Worker Environment Bundle
All background tasks (deml-telemetry-worker, deml-workers) inherit environment configurations from the core deml-backend settings. Use Cloud Run shared variables or Infisical to avoid drift. Centralized reads go through backend/utils/env.py.
Event flow (for operators):
- API writes →
OutboxEvent(Postgres) → outbox_relay → Redpanda - Angular/Firebase →
frontend-eventstopic → telemetry_worker → Postgres + Firestore - Idempotency keys + DLQ handled inside
telemetry_workerprojectors
Firebase (separate from Cloud Run): Cloud Functions (ingestEvent) and Firestore rules deploy via .github/workflows/firebase-backend-deploy.yml using FIREBASE_SERVICE_ACCOUNT_DEMLDOTCOM. For the fastest path (direct publish, no polling) give the Functions a public SASL-authenticated Redpanda listener (port 9093, SCRAM-SHA-256). See infrastructure/queue/, the updated REDPANDA_* guidance, and functions/src/index.ts. The inbox fallback remains only as defence-in-depth.
Marketing Site (not a Cloud Run service)
Hosted separately (Firebase Hosting / static). Build with the same URL trio:
| Variable | Example |
|---|---|
FRONTEND_URL |
https://deml.app |
BACKEND_URL |
https://backend.deml.app |
MARKETING_URL |
https://dataengineeringformachinelearning.com |
See marketing/.env.example. Legacy PUBLIC_MAIN_APP_URL / PUBLIC_API_BASE still work but are deprecated.
7. ClickHouse Database (Telemetry Storage)
ClickHouse is used to securely store all high-volume OpenTelemetry data from the widget and backend services.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/clickhouse - Builder: Dockerfile (utilizes
clickhouse/clickhouse-server:24.3) - Start Command: Uses default Docker entrypoint
- Target Port:
8123(HTTP) and9000(Native) - Private Internal DNS:
deml-clickhouse.internal - Public URL: None (Strictly an internal database)
- Compute Limits: 24 vCPU / 24 GB Memory
- Persistent Storage: You MUST attach a Cloud Run Persistent Volume to
/var/lib/clickhouse. - Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
- Environment Variables:
- CLICKHOUSE_USER: Leave this variable completely unset/deleted on the ClickHouse service if you want to use the
defaultuser. If you define it asdefaultexplicitly, ClickHouse's entrypoint will skip setting the password, causing connection errors in other services. - CLICKHOUSE_PASSWORD: Set a secure password (e.g. for the default user).
- CLICKHOUSE_DB:
otel
- CLICKHOUSE_USER: Leave this variable completely unset/deleted on the ClickHouse service if you want to use the
[!IMPORTANT] ClickHouse Password Gotcha: Do not define
CLICKHOUSE_USERasdefaultin the ClickHouse service environment variables. If you wish to use thedefaultuser, simply omit theCLICKHOUSE_USERvariable entirely from the ClickHouse service. The entrypoint script will automatically apply yourCLICKHOUSE_PASSWORDto the default user. Make sureCLICKHOUSE_USERis still set todefaultin your otel-collector and backend services so they connect correctly.
8. OpenTelemetry Collector (Router)
The OpenTelemetry Collector receives all spans and metrics from the frontend widget and backend, processing them securely before batch-inserting into ClickHouse.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/otel-collector - Builder: Dockerfile (utilizes secure
otel/opentelemetry-collector-contribdistroless base) - Start Command: Uses default Docker entrypoint
- Target Port:
4318(OTLP HTTP) - Private Internal DNS:
deml-telemetry-collector.internal - Public URL:
https://telemetry.deml.app - Compute Limits: 24 vCPU / 24 GB Memory
- Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
- Environment Variables:
- CLICKHOUSE_HOST: The internal TCP host of your ClickHouse service (e.g.
deml-clickhouse.internal). - CLICKHOUSE_USER: Must match what you set in the ClickHouse service.
- CLICKHOUSE_PASSWORD: Must match what you set in the ClickHouse service.
- CLICKHOUSE_HOST: The internal TCP host of your ClickHouse service (e.g.
9. Vulnerability Scanner Engine
This microservice provides an offline, isolated environment for executing osv-scanner and cpe-guesser to enrich telemetry without bloating the main backend image.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/scanner - Builder: Dockerfile (utilizes
python:3.11-slimwith the official Googleosv-scannerbinary) - Start Command:
uvicorn main:app --host 0.0.0.0 --port 8000(Default in Dockerfile) - Target Port:
8000(FastAPI) - Private Internal DNS:
deml-scanner.internal:8000 - Public URL: None (Strictly an internal service)
- Compute Limits: 24 vCPU / 24 GB Memory
- Persistent Storage: You MUST attach a Cloud Run Persistent Volume to
/data/osvso the OSV database dump does not have to be repeatedly downloaded. - Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
- Environment Variables:
- OSV_DB_PATH:
/data/osv(The mounted volume path) - CPE_GUESSER_URL:
http://deml-cpe-guesser.internal:1323/unique - NVD_API_KEY: Your National Vulnerability Database API Key (optional but highly recommended to bypass rate limits)
- OSV_DB_PATH:
Consumers (deml-backend, workers) set SCANNER_SERVICE_URL=http://deml-scanner.internal:8000.
10. CPE Guesser Service
This service converts raw technology strings into CPE 2.3 identifiers. It is required for the Vulnerability Scanner Engine to properly normalize infrastructure data.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/cpe-guesser - Builder: Dockerfile (Builds from source using Python 3.11 with an internal Valkey/Redis cache)
- Start Command:
/app/start.sh(Default in Dockerfile) - Target Port:
1323 - Private Internal DNS:
deml-cpe-guesser.internal - Public URL: None (Strictly an internal service)
- Compute Limits: 1 vCPU / 1 GB Memory
- Deployment Trigger: Auto-deploys when changes are pushed to GitHub.
- Environment Variables: None required by default.
(Once deployed, ensure the CPE_GUESSER_URL environment variable on the Vulnerability Scanner Engine points to this internal DNS, e.g., http://deml-cpe-guesser.internal:1323/unique)
11. Tor Proxy (Dark Web Scanner)
A lightweight proxy that allows the security worker to anonymously scrape dark web search engines (e.g., Ahmia) for brand mentions.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/tor-proxy - Builder: Dockerfile (Minimal
alpineimage running as non-roottoruser) - Target Port:
9050 - Private Internal DNS:
deml-tor-proxy.internal - Environment Variables: None on the proxy itself.
Consumers (deml-backend, deml-security-worker) must set:
TOR_PROXY_URL=socks5h://deml-tor-proxy.internal:9050
13. Dragonfly (Redis Replacement for WebSockets)
This service provides the in-memory pub/sub message broker required by Django Channels to route real-time WebSocket traffic. We use a custom, highly secure distroless image to minimize the attack surface.
- Source: GitHub repository (
mainbranch) - Root Directory:
/infrastructure/dragonfly - Builder: Dockerfile (Multi-stage build using Google Distroless
cc-debian12:nonroot) - Target Port:
6379 - Private Internal DNS:
deml-dragonfly.internal - Public URL: None (Strictly an internal service)
- Environment Variables: None required by default.
(Once deployed, set DRAGONFLY_HOST=deml-dragonfly.internal on deml-backend, all workers, and any service using Channels or rate limiting.)
Internal Networking
All inter-service traffic uses Cloud Run private DNS (*.internal). Never route broker, database, or cache traffic over public URLs.
| Service | Internal address |
|---|---|
| Backend API | deml-backend.internal:8080 |
| Frontend | deml-frontend.internal:8080 |
| Postgres | Via DATABASE_URL (internal connection string from Cloud Run) |
| Redpanda | deml-queue.internal:9092 |
| Dragonfly | deml-dragonfly.internal:6379 |
| ClickHouse | deml-clickhouse.internal:8123 |
| Scanner | deml-scanner.internal:8000 |
| CPE Guesser | deml-cpe-guesser.internal:1323 |
| Tor proxy | deml-tor-proxy.internal:9050 |
Local Development (docker-compose.yml)
Local parity includes: backend, telemetry_worker, ml_worker, security_worker, outbox_relay, Postgres, Redpanda, Dragonfly, ClickHouse, Tor proxy, and supporting infra. Copy backend/.env.example to backend/.env and use localhost overrides:
REDPANDA_BROKERS=redpanda:9092
DRAGONFLY_HOST=dragonfly
FRONTEND_URL=http://localhost:4200
BACKEND_URL=http://localhost:8000
MARKETING_URL=http://localhost:4321
TOR_PROXY_URL=socks5h://tor-proxy:9050
Run docker compose up from the repo root. Frontend: cd frontend && npm start. Marketing: cd marketing && npm run dev.
Updating Environment Variables
- Prefer setting in GCP dashboard (Variables tab per service) or via CLI.
- After changing build-time vars (
MARKETING_URL,BACKEND_URL,FIREBASE_*) for frontend, trigger a new deploy soset-env.jsruns. - Keep
backend/.env.example,frontend/.env.example, andmarketing/.env.examplein sync with reality.
Security Notes
- Never commit real
.env. - Secrets (Stripe, Resend, Firebase SA, KMS, HF) should use Cloud Run secret variables or Infisical integration.
- CORS/CSRF lists are the primary control for cross-origin auth handoff.
See also: backend/.env.example, frontend/.env.example, marketing/.env.example, BOOK.md (Event Projections chapter), and AGENTS.md (CORS rule: never hardcode domains).
Cloud Run CLI Quick Reference
gcloud config set project
gcloud run services update deml-backend --set "MARKETING_URL=https://dataengineeringformachinelearning.com"
gcloud run services update deml-frontend --set "SENTRY_DSN=<your-dsn>"
gcloud run services update deml-backend
After any build-time variable change on deml-frontend, trigger a redeploy.
CI/CD Pipeline
- All services are linked to the
mainbranch of thedataengineeringformachinelearningrepository. - Pushes to the
mainbranch will automatically trigger new builds and deployments for the affected services. - Automated security testing via Socket.dev and Checkov pre-commit hooks runs on every push.
- Watch Paths: You can set gitignore-style rules (e.g.,
/frontend/**or/backend/**) in the Cloud Run settings to ensure that a service only rebuilds when its specific directory changes.
Reliability and Scaling
- Restart Policy: All services are configured to restart "On Failure" with a maximum of 10 retries, ensuring automatic recovery from temporary crashes.
- Region: US East (Virginia, USA)
- Replicas: 1 replica per service.