How our logging and analytics pipeline is designed
One of Fastly's most distinguishing features is its advanced Real-time Log Streaming platform, which lets you receive logs for all requests across the whole network (on a best effort basis.) Various backend "log sinks" are included, ranging from S3 requests (batched, long-term cold storage) to real-time streams using protocols such as syslog
(see RFC 5424).
Insight into usage of the Nix cache has historically been extremely limited, but this gives us a path to doing real time investigation of user issues, which was something that was previously impossible, as well as bandwidth growth rate, etc.
{ "start" : %{time.start.sec}V
, "ttfb" : %{time.to_first_byte}V
, "elapsed" : %{time.elapsed.usec}V
, "time_begin" : "%{begin:%Y-%m-%dT%H:%M:%S}t"
, "time_end" : "%{end:%Y-%m-%dT%H:%M:%S}t"
, "state" : "%{regsub(fastly_info.state, "^(HIT-(SYNTH)|(HITPASS|HIT|MISS|PASS|ERROR|PIPE)).*", "\\\\2\\\\3") }V"
, "datacenter" : "%{server.datacenter}V"
, "hostname" : "%{server.hostname}V"
, "dcregion": "%{server.region}V"
, "continent_code" : "%{client.geo.continent_code}V"
, "country_code" : "%{client.geo.country_code}V"
, "region" : "%{client.geo.region}V"
, "ua" : "%{User-Agent}i"
, "ipv6" : %{if(req.is_ipv6, "true", "false")}V
, "http2" : %{if(fastly_info.is_h2, "true", "false")}V
, "tls" : %{if(req.is_ssl, "true", "false")}V
, "protocol": "%H"
, "method": "%m"
, "path": "%U"
, "status": %>s
, "response_size": %B
, "as_number": %{client.as.number}V
, "as_name": "%{client.as.name}V"
}
If we have configured logging through Fastly to some S3 bucket, we can generally incorporate those records into a bunch of different data pipelines. However, since we're primarily going to be using one single platform for handling our logs (see below), we really only need cold storage. This is where S3 comes in.
Aside from the cold S3 storage we push the logs into, we also have a real-time pipeline that batches and inserts logs directly into our analytics tools, as requests are coming into Fastly. This pipeline is composed of two primary components: Vector, a log router, and ClickHouse, a distributed, columnar SQL database that handles our OLAP queries.
-- raw table that contains all incoming data directly from the logging pipeline.
-- in general, it's expected MATERIALIZED VIEWs will be used to slice things up
-- appropriately for reporting, but querying this table directly will likely be
-- useful as well
CREATE TABLE IF NOT EXISTS fastly_logs.nix_cache
( `start` UInt64 COMMENT 'Start time' CODEC(Delta)
, `ttfb` Float32 COMMENT 'Time-to-first-byte (client)' CODEC(ZSTD)
, `elapsed` UInt32 COMMENT 'Total transfer time' CODEC(ZSTD)
, `time_begin` DateTime COMMENT 'Timestamp of request start' CODEC(ZSTD)
, `time_end` DateTime COMMENT 'Timestamp of response end' CODEC(ZSTD)
, `state` String COMMENT 'Cache state' CODEC(ZSTD)
, `datacenter` String COMMENT 'Fastly POP location' CODEC(ZSTD)
, `hostname` String COMMENT 'Hostname of cache server' CODEC(ZSTD)
, `dcregion` String COMMENT 'POP region' CODEC(ZSTD)
, `continent_code` String COMMENT 'Continent code' CODEC(ZSTD)
, `country_code` String COMMENT 'Country code' CODEC(ZSTD)
, `region` String COMMENT 'User-agent region' CODEC(ZSTD)
, `ua` String COMMENT 'FIXME' -- NB: should be 'nix_version'!
, `ipv6` UInt8 COMMENT 'IPv6 client?' CODEC(NONE)
, `http2` UInt8 COMMENT 'HTTPv2 request?' CODEC(NONE)
, `tls` UInt8 COMMENT 'TLS enabled?' CODEC(NONE)
, `protocol` String COMMENT 'Request protocol' CODEC(NONE)
, `method` String COMMENT 'HTTP method (GET, etc)' CODEC(NONE)
, `path` String COMMENT 'URL path for request' CODEC(NONE)
, `status` UInt16 COMMENT 'HTTP response status code' CODEC(NONE)
, `response_size` UInt64 COMMENT 'HTTP response body size' CODEC(NONE)
, `as_number` UInt64 COMMENT 'Autonomous system (AS) ID' CODEC(NONE)
, `as_name` String COMMENT 'AS Organization Name' CODEC(ZSTD)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(time_begin)
ORDER BY (time_begin)
SETTINGS index_granularity=8192
This is the base table which can have the JSON records written into it, and outside of S3, is the "authoritative" source of truth for the logs.