Chapters · #22

Production Deployment on Cloud Run

Name: Chapter 22: Production Deployment on Cloud Run - The Book - DEML Platform
Author: Joe Alongi

Reading Progress53%

Chapter 22: Production Deployment on Cloud Run

The ultimate crucible for any software architecture is its transition from a controlled local development environment into the hostile, chaotic reality of the public internet. The "it works on my machine" paradigm is an unacceptable failure of engineering discipline. To guarantee that my platform performs with absolute consistency and resilience, deployment cannot be treated as a discrete, manual event. It must be codified, automated, and treated as a seamless extension of my Continuous Integration pipeline. To achieve this modern deployment topology, I host my entire infrastructure on Cloud Run.

Google Cloud provides the declarative infrastructure-as-code capabilities required to orchestrate my complex, multi-service architecture without the immense cognitive overhead of manually configuring Kubernetes clusters. I deploy the Angular Web Frontend, the Django REST API, my persistent PostgreSQL databases, the Redpanda event bus, and my specialized asynchronous Telemetry Workers as distinct, independently scalable services within a unified Google Cloud environment.

This topology allows me to scale my infrastructure surgically. If a massive influx of external traffic threatens to overwhelm the platform, Cloud Run automatically provisions additional replica nodes for the Django API edge, while the Telemetry Workers continue to process the Redpanda queue at their own deliberate, uncompromised pace. The frontend static assets are distributed globally to edge nodes, ensuring rapid time-to-interactive for users regardless of their geographic location.

Crucially, the entire deployment lifecycle is governed by automated CI/CD triggers. When a developer merges a feature branch into main after passing the rigorous suite of automated tests and accessibility audits, Cloud Run intercepts the webhook. It autonomously pulls the latest repository commit, initiates the multi-stage Docker builds, executes the database migrations, and performs a zero-downtime rolling deployment. The complete per-service setup checklist — env vars, workers, outbox relay, cross-site URL trio, and local docker-compose parity — lives in Appendix C. This architecture ensures that my platform is not just ready for production release; it actively thrives in it, providing an unyielding foundation for my machine learning and telemetry operations.

Infrastructure & Compute Resource Allocation

To maintain a highly efficient, cost-optimized deployment footprint (particularly on platforms like Cloud Run or Kubernetes), the platform is designed to run extremely lean. By intentionally constraining CPU and memory limits, we force aggressive garbage collection and prevent unbounded caching.

The recommended replica limits for a production-grade deployment are:

Service	CPU Limit	RAM Limit	Justification
deml-backend (Django API)	4 vCPU	4 GB	Maximum concurrent worker capacity.
deml-frontend (Angular)	4 vCPU	4 GB	Rapid SSR and robust production build capacity.
deml-postgres	4 vCPU	4 GB	High-throughput transactional data store.
deml-clickhouse	4 vCPU	4 GB	Memory-intensive OLAP analytical queries.
deml-queue (Redpanda)	4 vCPU	4 GB	Pre-allocates memory for the Seastar framework.
deml-dragonfly	4 vCPU	4 GB	Ultra-fast in-memory cache operations.
deml-telemetry-worker	4 vCPU	4 GB	High-speed Polars batch processing.
deml-workers	4 vCPU	4 GB	PyTorch ML training + OSINT intel gathering.
deml-scanner	4 vCPU	4 GB	Heavy vulnerability database parsing.
deml-cpe-guesser	4 vCPU	4 GB	CPU-intensive NLP heuristics at scale.
deml-tor-proxy	4 vCPU	4 GB	High-bandwidth, encrypted network routing.

This complete 11-service architecture peaks at a combined maximum footprint of 44 vCPU and 44 GB RAM.

Estimated Monthly Infrastructure Costs

Assuming 24/7 continuous utilization at the maximum 4 vCPU / 4 GB limits across all services:

CPU Compute: 44 vCPUs × $20/vCPU = $880 / month
RAM Compute: 44 GB × $10/GB = $440 / month
Total Compute (Theoretical Maximum): ~$1,320 / month

Actual Baseline Usage (Estimated): Because Cloud Run bills strictly on consumed resources per minute rather than provisioned limits, the actual monthly operational cost is drastically lower than the theoretical maximum. Extrapolating from current active development and testing telemetry (roughly $27 over 24 days), a realistic baseline full-month estimate is approximately $35.00 per month.

The primary drivers of this ~$35 baseline cost are the heavily utilized core services:

deml-backend: ~$11.50/mo (High memory utilization)
deml-clickhouse: ~$8.00/mo (Heavy memory and volume I/O)
deml-workers: ~$6.00/mo (Spikes during periodic ML training and OSINT sync)
deml-telemetry-worker: ~$3.00/mo (Intermittent Polars processing)

Note on Persistent Volumes: In addition to standard compute, this architecture provisions persistent disk volumes for deml-postgres, deml-clickhouse, and deml-scanner. Storage on Cloud Run is billed at $0.15 per GB / month. However, because these volumes dynamically scale with data ingestion, their baseline cost footprint remains highly efficient—averaging only pennies during standard baseline operations, but capable of scaling to hundreds of gigabytes (e.g., ~$45/month for 300GB) if required.