Daily Platform Operations Audit
Chapter 32: Daily Platform Operations Audit
Operating a high-throughput event platform is not a one-time architecture exercise — it is a recurring discipline. Every production day, I run a structured audit across layout cohesion, accessibility, container security, tenant isolation, scheduler health, and documentation fidelity. The objective is not checkbox compliance; it is to catch drift before it becomes an incident or an audit finding. This chapter documents the operational playbook I execute on DEML and the invariants it enforces.
Responsiveness, layout harmony, and viewport height
Mobile-first is non-negotiable. Every stylesheet defaults to single-column layouts and scales up with @media (min-width: …) — never the reverse. The enforcement script node scripts/check_mobile_first.js scans Angular, Viking-UI, marketing, and Django static surfaces for forbidden desktop-first max-width breakpoints and fails CI when violations appear. Dashboard pages share one shell: .dashboard-page-container → .dashboard-content-area → .page-inner-wrapper at --viking-container-max-width (1260px). The public /status route uses the same container classes so navigation between Command Center, Analytics, Settings, and System Status produces zero horizontal layout shift. Full viewport height is managed intentionally: .dashboard-wrapper and .dashboard-page-container chain min-height: calc(100vh - var(--navbar-height)) with flex children that grow (flex: 1) so sidebars and main content initialize to full height without cropping scroll regions.
Framework cohesion and light/dark modes
Angular deml.app, Astro marketing, Django templates, and Swagger UI all load the same compiled design-tokens.css and viking-ui.css bundle synced from frontend/projects/viking-ui/src/styles/ via scripts/sync_design_system.py. Static CSS is built by viking-ui-docs/scripts/build-static-css.mjs — outside the Railway frontend Docker context — while production Angular images compile live SCSS with includePaths pointed at the canonical token directory. Light mode shifts lightness only; semantic aliases (--viking-bg, --viking-surface, --viking-accent) preserve WCAG 2.1 AA contrast in both themes. node scripts/enforce-theme.js blocks hardcoded hex drift before merge.
Accessibility and Section 508
Keyboard navigation, visible :focus-visible rings (--viking-ring), semantic headings, explicit image alts, and aria-* on dynamic widgets are enforced mechanically. node scripts/run_axe.js targets marketing HTML and Astro output for WCAG 2.1 AA violations. Viking-UI form stacks compose through viking-field so labels, errors, and required states remain screen-reader coherent. Status alone is insufficient — automated gates run on every pre-commit pass.
Railway deployment and distroless runtimes
Production services deploy on Railway using per-service manifests under infrastructure/railway/. The Angular SSR frontend builds with node:24-alpine, then copies artifacts into gcr.io/distroless/nodejs22-debian12. The Django API compiles dependencies and collectstatic in python:3.11-slim-bookworm, then runs from gcr.io/distroless/python3-debian12 as UID nonroot. No shell, no package manager, no opportunistic curl inside the runtime — the attack surface is the binary and its shared libraries, nothing else. Health checks hit /api/v1/system-status/health on the API service; workers restart on failure with bounded retries.
Database ingestion, retention, and scheduler efficiency
Telemetry enters through /api/v1/ingest, Firebase ingestEvent, and OpenTelemetry collectors. Commands land in Postgres via transactional Outbox; outbox_relay publishes every five seconds. telemetry_worker projects idempotently into Firestore and enrichment tables. Retention constants live in backend/utils/retention.py: raw telemetry and audit logs purge at 30 days; published outbox rows at 30 days; DLQ candidates at 7 days; DEK rotation triggers at 30 days. security_worker runs hourly threat-intel fetch, daily db_cleanup, Stripe sync_subscriptions, and OSINT passes. ml_worker retrains SLA and threat models daily on anonymized aggregates. Email dispatches route through Resend with queued outbox semantics — no fire-and-forget sends from request threads.
Multi-tenant security and regulatory alignment
Every ORM query that touches tenant-owned rows filters by authenticated tenant UUID or explicit account_id from API key resolution. Background workers iterate Tenant.objects.all() symmetrically — Tenant0 is bootstrapped as is_platform_tenant=True, not hardcoded as a string FK. Field-level AES-256-GCM with GCP KMS envelope rotation protects integration secrets. Immutable Cloud Logging satisfies SIEM non-repudiation for SOC 2 CC7 and CMMC AU controls. CES aggregation is the sole exception to strict per-tenant siloing: ClickHouse rollups feed Threat, SLA, and Stableness gauges with mathematically anonymized outputs — no raw user identifiers cross into the CES engine.
Code quality and test synchronization
Before any merge, I run uvx pre-commit run --all-files, npm run test:viking-ui, backend pytest on touched modules, and rebuild static CSS when tokens change. Dead dependencies are pruned from package.json and requirements.txt when enforcement proves them unused. Tests mirror security boundaries: mocked tenant contexts in API tests, idempotency keys in event contract tests, and Vitest coverage on chart math, auth signals, and icon resolution in Viking-UI.
Documentation as operational truth
When infrastructure or compliance posture changes, I update BOOK.md first, sync WHITEPAPER.md milestones, refresh the live /documentation Developer Portal on the marketing site, and run python3 scripts/sync_content.py so deml.app routes and search indexes stay aligned. Documentation is not marketing copy — it is the contract operators and auditors read when production behavior must be verified under stress.