Building an Asynchronous Asset Inventory

Reading Progress62%

Chapter 26: Building an Asynchronous Asset Inventory

As the platform scales, manually tracking third-party dependencies and infrastructure components becomes an impossible task. To solve this, I designed a dual-stream Asset Inventory and Vulnerability Scanner engine that operates asynchronously without bloating the core application.

Instead of embedding heavy security scanning tools directly into the main Django backend, I created an isolated, offline microservice (scanner/) built on FastAPI. This service utilizes the official osv-scanner binary to parse application lockfiles (like requirements.txt or package-lock.json) against a locally mounted OSV database, ensuring no sensitive manifests are transmitted over the public internet. Concurrently, it leverages a cpe-guesser to normalize raw infrastructure strings (e.g., "nginx 1.21") into standardized CPE 2.3 formats.

The ingestion pipeline in the core backend (backend/telemetry/vulnerability_ledger.py) exposes a unified /api/telemetry/technology endpoint capable of receiving both infrastructure signatures and application dependency manifests. To achieve maximum throughput, the backend processes these payloads in batches using the high-performance Polars DataFrame library. Once the raw telemetry is normalized, the backend securely delegates all scanning logic to the isolated scanner microservice. This microservice dynamically queries the National Vulnerability Database (NVD) REST API and OSV.dev REST API to extract exact CVEs and CVSS metrics, seamlessly bridging the gap between hardware/infrastructure reporting and modern application lockfile scanning.

Finally, the fully enriched vulnerability ledger is written to ClickHouse via the ADBC driver for fast, analytical querying, while critical vulnerabilities are selectively synchronized into the operational Django database to alert administrators via the real-time Security dashboard.