The tools we need to get the job done and their roles
The data plane is what serves objects to users. There are two critical components to its operation, both third party services.
Fastly is our CDN provider who sponsors the binary cache, providing us with a programmable HTTP/caching layer for storing content, and helping save costs. The Fastly configuration for the NixOS cache is an advanced setup written primarily in Varnish Configuration Language (VCL) for all delivery logic, with other components (like logging) built directly on Fastly's platform as well.
We push terabytes of data a month through Fastly, graciously sponsored by their Open-Source and Non-profit program.
S3 needs no introduction. It's a petabyte-scale object storage service, and currently the home of our sacred Nix cache (totaling +150TB of data.) Hydra uploads files into S3 (almost continuously, 24/7/365) so they can be served by Fastly.
fastly-purged
fastly-purged
is a daemon developed for our "purge pipeline": it takes requests to purge stale 404s from the cache, batches them up, and then executes them upstream to limit the number of Fastly API requests. A Lambda function built on S3 (triggered on upload) pushes data into this service so it can do its work.
The logging infrastructure is used by cache operators to diagnose issues and watch performance problems. It has a few home-grown tools, but also significantly relies on third party tools as well.
Traefik is a "cloud native" HTTP and TCP router, load balancer, and proxy. Traefik handles all incoming HTTP, TCP traffic for the logging infrastructure, and also handles TLS termination & ACME registration, client authentication, and middleware concerns. It load balances HTTP requests for fastly-purged
, Vector, and at the same time also routes out metric endpoints for Prometheus to consume.
Whereas Traefik is an HTTP/TCP router, Vector is a log router. We use Vector to ingest real-time logs directly from Fastly using rsyslog
to ingest and buffer events to various backends. At the same time, other tooling also logs into vector from various sources (TCP
, statsd
, etc) so it can be piped around. Vector is responsible for loading all data into ClickHouse, and it also helpfully consumes statsd
events and re-directs them an HTTP endpoint for Prometheus.