AWS Deployment
Appendix E: AWS Deployment
Env templates: backend/.env.example, frontend/.env.example, marketing/.env.example.
This appendix is the complete setup checklist for deploying the DEML platform on AWS using Lightsail Container Services (application layer), ECR (images), Fargate or Lightsail instances (stateful Redpanda / ClickHouse), and RDS or Lightsail Database (Postgres). The goal is the cheapest manageable footprint that preserves every architectural contract: the same Dockerfiles, the Event Projections loop (Outbox + Redpanda + idempotent workers), symmetrical tenant processing, distroless images, and the unchanged Firebase / Firestore surfaces. Every hostname and cross-site URL remains env-driven.
Pre-Deploy Checklist
Before provisioning resources:
- AWS Account & IAM: Create an IAM user or role with least-privilege access for ECR, Lightsail, ECS/Fargate, RDS, Secrets Manager, and Route 53. Prefer OIDC from GitHub Actions for CI.
- ECR Repositories: Create private repositories (e.g.
deml-frontend,deml-backend,deml-workers,deml-daemon,deml-scanner). Enable image scanning. - Secrets: Store in AWS Secrets Manager or Lightsail environment variables:
SECRET_KEY, Firebase service account JSON, Stripe, Resend, threat intel keys,HF_TOKEN,SENTRY_DSN, Redpanda SASL credentials. Never commit secrets. - Cross-site URL trio (identical to other environments):
| Variable | Production value | Purpose |
|---|---|---|
FRONTEND_URL |
https://deml.app |
Angular app, widgets, status |
BACKEND_URL |
https://backend.deml.app |
Django API, OAuth callbacks |
MARKETING_URL |
https://dataengineeringformachinelearning.com |
Astro site, auth handoff, CORS |
- CORS / CSRF: Extend
CORS_ALLOWED_ORIGINSandCSRF_TRUSTED_ORIGINSwith your production domains. Usebackend/monitor/cors_utils.pydynamic registration for customer sites. - Production guards:
DEBUG=False, strongSECRET_KEY. The backend fails fast on insecure settings viabackend/utils/env.py. - Postgres migration: Export from current host (
pg_dump), import into RDS or Lightsail DB. Test connectivity from a temporary task before cutover. - Domain & DNS: Use Route 53 (or keep existing). Lightsail can issue and attach ACM certificates for custom domains on Container Services.
Core AWS Topology
- Lightsail Container Service (one service): Deploy the multi-container application fleet (frontend SSR, backend, workers, daemon, scanner + sidecars). Choose Micro ($10/mo) or Small ($15/mo) node size. Containers inside the service discover each other by name.
- Stateful data plane: Redpanda (1-broker) and ClickHouse on either dedicated small Lightsail Linux instances (with Docker) or ECS Fargate tasks (0.5–1 vCPU, EBS volumes). Dragonfly can be a side container.
- Database: Amazon RDS for PostgreSQL (db.t4g.micro recommended) or Lightsail Database for simplest management.
- Registry & CI: ECR + GitHub Actions (build → push → deploy).
- Networking: Internal discovery inside the Lightsail service; public or peered hostnames for Redpanda/ClickHouse/RDS. Use security groups and Lightsail networking features. Never expose Redpanda or ClickHouse publicly without strong auth.
GitHub Actions CI/CD Pattern (ECR + Lightsail / Fargate)
# .github/workflows/aws-deploy.yml (example)
name: AWS Deploy
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::ACCOUNT:role/github-actions-deploy
aws-region: us-east-1
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build and push frontend
run: |
docker build -t $ECR_REGISTRY/deml-frontend:$GITHUB_SHA -f frontend/Dockerfile frontend
docker push $ECR_REGISTRY/deml-frontend:$GITHUB_SHA
# Repeat for backend, rust/deml-daemon (context ./rust), infrastructure/scanner, etc.
- name: Deploy to Lightsail
run: |
aws lightsail create-container-service-deployment \
--service-name deml-apps \
--containers file://deploy/lightsail-containers.json \
--public-endpoint '{"containerName":"frontend","containerPort":8080}'
Update the containers JSON (or use the Lightsail console) to reference the new image tags and set environment variables. Redeploy only affected containers when possible.
Services Overview (Lightsail Container Service)
Frontend (deml-frontend)
- Root / context:
frontend/ - Image:
gcr.io/distroless/nodejs22-debian12runtime (SSR viadist/frontend/server/server.mjs) - Public endpoint on the Lightsail service.
- Required build-time + runtime vars:
FRONTEND_URL,BACKEND_URL,MARKETING_URL, allFIREBASE_*,SANITY_*,SENTRY_DSN.
Backend (deml-backend)
- Root:
backend/ - Image: distroless Python (with liboqs built in Dockerfile).
- Start:
python start.py(or equivalent Daphne/ASGI for production). - Internal port typically 8000 or 8080.
- Full env bundle:
DATABASE_URL,REDPANDA_BROKERS,CLICKHOUSE_*,DRAGONFLY_HOST, Firebase service account, threat intel keys, etc.
Workers & Daemon
deml-workers: Consolidated ML + security + cron consumer (deml_workers_start.py).deml-daemon: Rust binary (outbox relay primary path, pinger, cron publisher). Built fromrust/deml-daemon/Dockerfile.- Telemetry worker logic can run inside the workers container or as a separate container definition in the same Lightsail deployment.
- Pass
SCANNER_SERVICE_URL,CPE_GUESSER_URL,TOR_PROXY_URLusing the internal container hostnames.
Light sidecars (scanner, cpe-guesser, tor-proxy) run as additional containers in the same service with read_only, security_opt: no-new-privileges, tmpfs where appropriate.
Stateful Components (Separate Targets)
Redpanda (deml-queue)
- Run the official
docker.redpanda.com/redpandadata/redpandaimage (or the infrastructure/queue variant). - Single broker configuration mirroring
docker-compose.yml(internal listener, optional external SASL listener on 19092). - Persistent EBS / Lightsail block storage for data and logs.
- Expose via private hostname or authenticated public endpoint only as needed for Firebase Functions.
- Set
REDPANDA_BROKERSand SASL variables on consumers.
ClickHouse
- Use the
infrastructure/clickhouseDockerfile. - Mount persistent volume at
/var/lib/clickhouse. - Configure via
clickhouse-config.xmland environment (CLICKHOUSE_USER,CLICKHOUSE_PASSWORD). - Consumers point
CLICKHOUSE_HOSTand port 8123.
Postgres
- RDS: Create instance, set
DATABASE_URL=postgres://.... - Or Lightsail Database and obtain the connection string.
- Run migrations via a one-off task or the backend start command on first deploy.
Dragonfly
- Can be a container inside the main Lightsail service or a separate minimal Fargate task / instance.
- Set
DRAGONFLY_HOSTaccordingly.
Networking & Internal DNS
Inside a Lightsail Container Service, containers resolve each other by the names you assign in the deployment specification. For components outside that service:
- Use the public DNS of the Lightsail instance or the Fargate service discovery / load balancer target.
- For RDS, use the RDS endpoint.
- Always prefer private networking where AWS makes it available (Lightsail peering or VPC).
- Update health checks (
/api/v1/system-status/health) and the Event Projections synthetic probe after cutover.
Migration Considerations from Railway (or Cloud Run)
- Build and push all images to ECR from the existing Dockerfiles.
- Provision Postgres target and restore data.
- Stand up Redpanda and ClickHouse with volumes; use
rpkto recreate topics or mirror from the old cluster. - Create the Lightsail Container Service deployment referencing ECR images and the full environment variable set.
- Point DNS / load balancer at the new public endpoint(s).
- Verify: platform-status Event Projections check, worker logs, outbox relay, ML inference, and a full tenant data path.
- Decommission old services only after stable observation period.
All operational modes (normal, broker degraded with Firestore fallback, worker restart, etc.) and the symmetrical multi-tenant loops remain identical because the code paths are unchanged.
Cost Controls & Right-Sizing
- Start with the smallest viable node (Micro) in Lightsail and one replica set.
- Use Fargate Spot for Redpanda and ClickHouse where recovery semantics are acceptable.
- Monitor with CloudWatch or existing Sentry + CES gauges.
- Scale horizontally by adding Lightsail nodes only when CPU/memory or request latency demands it.
- Persistent storage for ClickHouse and Redpanda is the primary growth vector; set retention policies aggressively in the workers (
db_cleanup, ClickHouse TTL).
This AWS topology delivers a production-grade, observable, secure deployment at significantly lower operational surface area than a dozen individually managed services while staying faithful to the platform's precision-engineered design.