How our logging and analytics pipeline is designed

Introduction

One of Fastly's most distinguishing features is its advanced Real-time Log Streaming platform, which lets you receive logs for all requests across the whole network (on a best effort basis.) Various backend "log sinks" are included, ranging from S3 requests (batched, long-term cold storage) to real-time streams using protocols such as syslog (see RFC 5424).

Insight into usage of the Nix cache has historically been extremely limited, but this gives us a path to doing real time investigation of user issues, which was something that was previously impossible, as well as bandwidth growth rate, etc.

A JSON schema for Fastly logs

{ "start" : %{time.start.sec}V
, "ttfb" : %{time.to_first_byte}V
, "elapsed" : %{time.elapsed.usec}V
, "time_begin" : "%{begin:%Y-%m-%dT%H:%M:%S}t"
, "time_end" : "%{end:%Y-%m-%dT%H:%M:%S}t"
, "state" : "%{regsub(fastly_info.state, "^(HIT-(SYNTH)|(HITPASS|HIT|MISS|PASS|ERROR|PIPE)).*", "\\\\2\\\\3") }V"
, "datacenter" : "%{server.datacenter}V"
, "hostname" : "%{server.hostname}V"
, "dcregion": "%{server.region}V"
, "continent_code" : "%{client.geo.continent_code}V"
, "country_code" : "%{client.geo.country_code}V"
, "region" : "%{client.geo.region}V"
, "ua" : "%{User-Agent}i"
, "ipv6" : %{if(req.is_ipv6, "true", "false")}V
, "http2" : %{if(fastly_info.is_h2, "true", "false")}V
, "tls" : %{if(req.is_ssl, "true", "false")}V
, "protocol": "%H"
, "method": "%m"
, "path": "%U"
, "status": %>s
, "response_size": %B
, "as_number": %{client.as.number}V
, "as_name": "%{client.as.name}V"
}

S3 Logging

If we have configured logging through Fastly to some S3 bucket, we can generally incorporate those records into a bunch of different data pipelines. However, since we're primarily going to be using one single platform for handling our logs (see below), we really only need cold storage. This is where S3 comes in.

Real-time pipeline

Aside from the cold S3 storage we push the logs into, we also have a real-time pipeline that batches and inserts logs directly into our analytics tools, as requests are coming into Fastly. This pipeline is composed of two primary components: Vector, a log router, and ClickHouse, a distributed, columnar SQL database that handles our OLAP queries.

Vector processing

ClickHouse pipeline

Database schema

-- raw table that contains all incoming data directly from the logging pipeline.
-- in general, it's expected MATERIALIZED VIEWs will be used to slice things up
-- appropriately for reporting, but querying this table directly will likely be
-- useful as well
CREATE TABLE IF NOT EXISTS fastly_logs.nix_cache
  ( `start`          UInt64    COMMENT 'Start time' CODEC(Delta)
  , `ttfb`           Float32   COMMENT 'Time-to-first-byte (client)' CODEC(ZSTD)
  , `elapsed`        UInt32    COMMENT 'Total transfer time'         CODEC(ZSTD)
  , `time_begin`     DateTime  COMMENT 'Timestamp of request start'  CODEC(ZSTD)
  , `time_end`       DateTime  COMMENT 'Timestamp of response end'   CODEC(ZSTD)
  , `state`          String    COMMENT 'Cache state'                 CODEC(ZSTD)
  , `datacenter`     String    COMMENT 'Fastly POP location'         CODEC(ZSTD)
  , `hostname`       String    COMMENT 'Hostname of cache server'    CODEC(ZSTD)
  , `dcregion`       String    COMMENT 'POP region'                  CODEC(ZSTD)
  , `continent_code` String    COMMENT 'Continent code'              CODEC(ZSTD)
  , `country_code`   String    COMMENT 'Country code'                CODEC(ZSTD)
  , `region`         String    COMMENT 'User-agent region'           CODEC(ZSTD)
  , `ua`             String    COMMENT 'FIXME' -- NB: should be 'nix_version'!
  , `ipv6`           UInt8     COMMENT 'IPv6 client?'                CODEC(NONE)
  , `http2`          UInt8     COMMENT 'HTTPv2 request?'             CODEC(NONE)
  , `tls`            UInt8     COMMENT 'TLS enabled?'                CODEC(NONE)
  , `protocol`       String    COMMENT 'Request protocol'            CODEC(NONE)
  , `method`         String    COMMENT 'HTTP method (GET, etc)'      CODEC(NONE)
  , `path`           String    COMMENT 'URL path for request'        CODEC(NONE)
  , `status`         UInt16    COMMENT 'HTTP response status code'   CODEC(NONE)
  , `response_size`  UInt64    COMMENT 'HTTP response body size'     CODEC(NONE)
  , `as_number`      UInt64    COMMENT 'Autonomous system (AS) ID'   CODEC(NONE)
  , `as_name`        String    COMMENT 'AS Organization Name'        CODEC(ZSTD)
  ) ENGINE = MergeTree()
    PARTITION BY toYYYYMM(time_begin)
    ORDER BY (time_begin)
    SETTINGS index_granularity=8192

This is the base table which can have the JSON records written into it, and outside of S3, is the "authoritative" source of truth for the logs.