AWS Deployment

Reading Progress76%

Appendix E: AWS Deployment

Env templates: backend/.env.example, frontend/.env.example, marketing/.env.example.

This appendix is the complete setup checklist for deploying the DEML platform on AWS using Lightsail Container Services (application layer), ECR (images), Fargate or Lightsail instances (stateful Redpanda / ClickHouse), and RDS or Lightsail Database (Postgres). The goal is the cheapest manageable footprint that preserves every architectural contract: the same Dockerfiles, the Event Projections loop (Outbox + Redpanda + idempotent workers), symmetrical tenant processing, distroless images, and the unchanged Firebase / Firestore surfaces. Every hostname and cross-site URL remains env-driven.

Pre-Deploy Checklist

Before provisioning resources:

  1. AWS Account & IAM: Create an IAM user or role with least-privilege access for ECR, Lightsail, ECS/Fargate, RDS, Secrets Manager, and Route 53. Prefer OIDC from GitHub Actions for CI.
  2. ECR Repositories: Create private repositories (e.g. deml-frontend, deml-backend, deml-workers, deml-daemon, deml-scanner). Enable image scanning.
  3. Secrets: Store in AWS Secrets Manager or Lightsail environment variables: SECRET_KEY, Firebase service account JSON, Stripe, Resend, threat intel keys, HF_TOKEN, SENTRY_DSN, Redpanda SASL credentials. Never commit secrets.
  4. Cross-site URL trio (identical to other environments):
Variable Production value Purpose
FRONTEND_URL https://deml.app Angular app, widgets, status
BACKEND_URL https://backend.deml.app Django API, OAuth callbacks
MARKETING_URL https://dataengineeringformachinelearning.com Astro site, auth handoff, CORS
  1. CORS / CSRF: Extend CORS_ALLOWED_ORIGINS and CSRF_TRUSTED_ORIGINS with your production domains. Use backend/monitor/cors_utils.py dynamic registration for customer sites.
  2. Production guards: DEBUG=False, strong SECRET_KEY. The backend fails fast on insecure settings via backend/utils/env.py.
  3. Postgres migration: Export from current host (pg_dump), import into RDS or Lightsail DB. Test connectivity from a temporary task before cutover.
  4. Domain & DNS: Use Route 53 (or keep existing). Lightsail can issue and attach ACM certificates for custom domains on Container Services.

Core AWS Topology

  • Lightsail Container Service (one service): Deploy the multi-container application fleet (frontend SSR, backend, workers, daemon, scanner + sidecars). Choose Micro ($10/mo) or Small ($15/mo) node size. Containers inside the service discover each other by name.
  • Stateful data plane: Redpanda (1-broker) and ClickHouse on either dedicated small Lightsail Linux instances (with Docker) or ECS Fargate tasks (0.5–1 vCPU, EBS volumes). Dragonfly can be a side container.
  • Database: Amazon RDS for PostgreSQL (db.t4g.micro recommended) or Lightsail Database for simplest management.
  • Registry & CI: ECR + GitHub Actions (build → push → deploy).
  • Networking: Internal discovery inside the Lightsail service; public or peered hostnames for Redpanda/ClickHouse/RDS. Use security groups and Lightsail networking features. Never expose Redpanda or ClickHouse publicly without strong auth.

GitHub Actions CI/CD Pattern (ECR + Lightsail / Fargate)

# .github/workflows/aws-deploy.yml (example)
name: AWS Deploy

on:
 push:
 branches: [main]

jobs:
 build-and-deploy:
 runs-on: ubuntu-latest
 permissions:
 id-token: write
 contents: read
 steps:
 - uses: actions/checkout@v4
 - name: Configure AWS credentials
 uses: aws-actions/configure-aws-credentials@v4
 with:
 role-to-assume: arn:aws:iam::ACCOUNT:role/github-actions-deploy
 aws-region: us-east-1
 - name: Login to ECR
 uses: aws-actions/amazon-ecr-login@v2
 - name: Build and push frontend
 run: |
 docker build -t $ECR_REGISTRY/deml-frontend:$GITHUB_SHA -f frontend/Dockerfile frontend
 docker push $ECR_REGISTRY/deml-frontend:$GITHUB_SHA
 # Repeat for backend, rust/deml-daemon (context ./rust), infrastructure/scanner, etc.
 - name: Deploy to Lightsail
 run: |
 aws lightsail create-container-service-deployment \
 --service-name deml-apps \
 --containers file://deploy/lightsail-containers.json \
 --public-endpoint '{"containerName":"frontend","containerPort":8080}'

Update the containers JSON (or use the Lightsail console) to reference the new image tags and set environment variables. Redeploy only affected containers when possible.

Services Overview (Lightsail Container Service)

Frontend (deml-frontend)

  • Root / context: frontend/
  • Image: gcr.io/distroless/nodejs22-debian12 runtime (SSR via dist/frontend/server/server.mjs)
  • Public endpoint on the Lightsail service.
  • Required build-time + runtime vars: FRONTEND_URL, BACKEND_URL, MARKETING_URL, all FIREBASE_*, SANITY_*, SENTRY_DSN.

Backend (deml-backend)

  • Root: backend/
  • Image: distroless Python (with liboqs built in Dockerfile).
  • Start: python start.py (or equivalent Daphne/ASGI for production).
  • Internal port typically 8000 or 8080.
  • Full env bundle: DATABASE_URL, REDPANDA_BROKERS, CLICKHOUSE_*, DRAGONFLY_HOST, Firebase service account, threat intel keys, etc.

Workers & Daemon

  • deml-workers: Consolidated ML + security + cron consumer (deml_workers_start.py).
  • deml-daemon: Rust binary (outbox relay primary path, pinger, cron publisher). Built from rust/deml-daemon/Dockerfile.
  • Telemetry worker logic can run inside the workers container or as a separate container definition in the same Lightsail deployment.
  • Pass SCANNER_SERVICE_URL, CPE_GUESSER_URL, TOR_PROXY_URL using the internal container hostnames.

Light sidecars (scanner, cpe-guesser, tor-proxy) run as additional containers in the same service with read_only, security_opt: no-new-privileges, tmpfs where appropriate.

Stateful Components (Separate Targets)

Redpanda (deml-queue)

  • Run the official docker.redpanda.com/redpandadata/redpanda image (or the infrastructure/queue variant).
  • Single broker configuration mirroring docker-compose.yml (internal listener, optional external SASL listener on 19092).
  • Persistent EBS / Lightsail block storage for data and logs.
  • Expose via private hostname or authenticated public endpoint only as needed for Firebase Functions.
  • Set REDPANDA_BROKERS and SASL variables on consumers.

ClickHouse

  • Use the infrastructure/clickhouse Dockerfile.
  • Mount persistent volume at /var/lib/clickhouse.
  • Configure via clickhouse-config.xml and environment (CLICKHOUSE_USER, CLICKHOUSE_PASSWORD).
  • Consumers point CLICKHOUSE_HOST and port 8123.

Postgres

  • RDS: Create instance, set DATABASE_URL=postgres://....
  • Or Lightsail Database and obtain the connection string.
  • Run migrations via a one-off task or the backend start command on first deploy.

Dragonfly

  • Can be a container inside the main Lightsail service or a separate minimal Fargate task / instance.
  • Set DRAGONFLY_HOST accordingly.

Networking & Internal DNS

Inside a Lightsail Container Service, containers resolve each other by the names you assign in the deployment specification. For components outside that service:

  • Use the public DNS of the Lightsail instance or the Fargate service discovery / load balancer target.
  • For RDS, use the RDS endpoint.
  • Always prefer private networking where AWS makes it available (Lightsail peering or VPC).
  • Update health checks (/api/v1/system-status/health) and the Event Projections synthetic probe after cutover.

Migration Considerations from Railway (or Cloud Run)

  1. Build and push all images to ECR from the existing Dockerfiles.
  2. Provision Postgres target and restore data.
  3. Stand up Redpanda and ClickHouse with volumes; use rpk to recreate topics or mirror from the old cluster.
  4. Create the Lightsail Container Service deployment referencing ECR images and the full environment variable set.
  5. Point DNS / load balancer at the new public endpoint(s).
  6. Verify: platform-status Event Projections check, worker logs, outbox relay, ML inference, and a full tenant data path.
  7. Decommission old services only after stable observation period.

All operational modes (normal, broker degraded with Firestore fallback, worker restart, etc.) and the symmetrical multi-tenant loops remain identical because the code paths are unchanged.

Cost Controls & Right-Sizing

  • Start with the smallest viable node (Micro) in Lightsail and one replica set.
  • Use Fargate Spot for Redpanda and ClickHouse where recovery semantics are acceptable.
  • Monitor with CloudWatch or existing Sentry + CES gauges.
  • Scale horizontally by adding Lightsail nodes only when CPU/memory or request latency demands it.
  • Persistent storage for ClickHouse and Redpanda is the primary growth vector; set retention policies aggressively in the workers (db_cleanup, ClickHouse TTL).

This AWS topology delivers a production-grade, observable, secure deployment at significantly lower operational surface area than a dozen individually managed services while staying faithful to the platform's precision-engineered design.