Production Deployment on AWS — Lightsail Container Services and Fargate
Chapter 23: Production Deployment on AWS — Lightsail Container Services and Fargate
The decision to evaluate alternative hosting providers is never taken lightly in a zero-compromise engineering culture. After operating the platform on Railway with its convenient multi-service container model at approximately fifty dollars per month, it became clear that predictable costs and operational simplicity could be achieved on AWS without sacrificing the architectural invariants that define DEML: identical Docker images across environments, the Event Projections pipeline (Firebase ingestEvent and Django Outbox to Redpanda, idempotent telemetry_worker projections to Firestore), symmetrical processing loops over every Tenant including the platform Tenant0, distroless containers, and unprivileged execution. AWS offers a pragmatic middle path between the high-level PaaS experience of Railway and the full control of raw ECS or EKS. The recommended configuration uses Amazon Lightsail Container Services for the majority of the application and worker fleet, ECR as the image registry, and a minimal number of dedicated compute resources or Fargate tasks for the stateful data plane components that require persistent volumes and stable networking.
Lightsail Container Services deliver the closest operational experience to the existing Railway deployment. A single Container Service can host up to ten containers within one deployment, with automatic internal service discovery and load balancing across nodes. This allows the Angular SSR frontend, the Django backend API, the consolidated deml-workers (ML + security + internal tasks), the Rust deml-daemon (outbox relay, health pinger, cron publisher), the telemetry worker paths, the vulnerability scanner, and lighter sidecars such as the CPE guesser and Tor proxy to coexist inside the same managed service boundary. Networking between these containers uses simple container names and ports, exactly as docker-compose provides locally and as Railway internal hostnames provide in production. Fixed monthly pricing (Nano at seven dollars, Micro at ten dollars, Small at fifteen dollars per node) eliminates surprise bills while still supporting horizontal scaling by increasing node count. Build and deploy can reference images stored in Amazon ECR, keeping the multi-stage Dockerfiles (distroless Node for the frontend server.mjs, Python slim with liboqs for the backend, and minimal Debian for the Rust daemon) completely unchanged.
Stateful infrastructure receives special but minimal treatment. Redpanda continues to serve as the high-throughput, exactly-once event bus for frontend-events and internal-tasks topics; a single-broker deployment (matching the local compose configuration with --smp 1 and SASL) runs reliably on either a small Lightsail Linux instance or a tightly right-sized ECS Fargate task backed by EBS gp3 storage. ClickHouse for OLAP analytics and long-term telemetry follows the same pattern. Dragonfly can often run as an additional container inside one of the Lightsail services or as a tiny companion task. Because these components are the primary cost and operational differentiators, Fargate Spot pricing (up to seventy percent discount) can be applied where brief interruptions are tolerable and quick recovery via Redpanda's and ClickHouse's durability features is acceptable. Postgres transactional storage uses either Amazon RDS (db.t4g.micro or Serverless v2) or the Lightsail managed database offering for the simplest possible integration. All application containers continue to receive the same DATABASEURL, REDPANDA_BROKERS, CLICKHOUSE*, and DRAGONFLY_HOST environment variables; only the concrete hostnames change.
The entire lifecycle remains driven by GitHub Actions and ECR. On merge to main, workflows build each Dockerfile exactly as before, push versioned images to private ECR repositories, and then trigger deployment into the Lightsail Container Service (via the AWS CLI or Lightsail API) or update the corresponding Fargate services. Secrets are injected at runtime through AWS Secrets Manager or Lightsail environment variables, following the same discipline applied on other platforms. Cross-service URLs (the FRONTEND_URL, BACKEND_URL, MARKETING_URL trio) and Firebase configuration remain the single source of truth. Because the container contract is identical, local docker-compose, Railway, Cloud Run, and this AWS topology can be exercised with the same images. The result is a deployment that is both the cheapest practical option for the current service footprint and among the most manageable, requiring only a handful of Lightsail services and one or two persistent compute targets rather than a dozen individually configured PaaS entries.
Infrastructure & Compute Resource Allocation
Lightsail and Fargate encourage aggressive right-sizing because billing is tied directly to allocated vCPU and memory. The platform's existing lean design (small control loops, Polars for batch work, native SVG rendering, and event-driven rather than polling-heavy workers) translates directly. Recommended allocations for a cost-conscious AWS deployment are:
| Service / Group | vCPU | RAM | Notes / Justification |
|---|---|---|---|
| Lightsail Container Service (apps) | 0.25–0.5 | 1 GB | Hosts frontend SSR, backend, deml-workers, daemon, scanner and sidecars in one service |
| Redpanda (single broker) | 0.5 | 2 GB | EBS-backed; matches local compose; Spot-eligible |
| ClickHouse | 0.5–1 | 2–4 GB | Persistent volume for OLAP data; Spot or small instance |
| RDS / Lightsail DB (Postgres) | — | 1 GB | db.t4g.micro or equivalent Lightsail plan |
| Dragonfly (cache) | 0.25 | 0.5 GB | Can share a task or run inside the main container service |
This configuration keeps the total active footprint dramatically smaller than the theoretical Cloud Run maximum while preserving headroom for the periodic ML training and threat intel jobs.
Estimated Monthly Infrastructure Costs
Using 2026 US East (N. Virginia) pricing, a realistic steady-state deployment lands well under the prior Railway spend:
- Lightsail Container Service (Micro node): ~$10/mo (one node sufficient for low-to-moderate traffic; add nodes for scale).
- Redpanda on small Fargate task or Lightsail instance (0.5 vCPU / 2 GB + EBS): $8–15/mo (Spot brings the low end).
- ClickHouse similar: $8–15/mo.
- RDS db.t4g.micro Postgres (or Lightsail DB): ~$12–15/mo.
- ECR storage + data transfer + minimal EBS + Route 53: $3–6/mo.
- Projected total baseline: $35–55 per month depending on utilization spikes during training windows and data volume.
The largest variables are sustained vCPU during worker jobs and egress. Because the Event Projections path is already designed for bursty rather than constant high throughput, and because the deml-daemon and workers are tightly coupled to the Outbox and Redpanda topics rather than always-on polling, real costs trend toward the lower half of the range. Persistent volumes for ClickHouse and Redpanda are the only components that grow materially with usage; all other services scale horizontally inside the Lightsail Container Service at marginal additional node cost.
Migration from Railway (or any Docker-based host) is straightforward: push the existing images to ECR, provision the Lightsail service and stateful targets, point the environment variables at the new internal or public endpoints, migrate Postgres via dump/restore or DMS, and replay or mirror Redpanda topics as needed. The deml-daemon outbox relay and telemetry_worker remain authoritative. All Firebase surfaces (ingestEvent callables, Firestore projections, Auth) continue unchanged. The AWS path therefore serves both as an immediate cost and operational relief and as a fully supported alternative deployment topology that honors every principle in the platform's architecture.