How to cache 404s — and purge them — efficiently



Introduction

Hydra currently does not purge any object keys when they get uploaded into the cache. If an object is requested by a client and returns a 404, and Hydra uploads it 10 minutes later, there will be an hour expiry time before they can fetch it, because the 404 will exist in the cache. Hydra simply nix copy's directly into the target.

Ideally, all objects would just get instantly purged if their contents change — we are, after all, a content-addressable system. Working this purge logic into Hydra or nix copy in some generalized way might prove difficult and lengthy, and it's unclear if it's even the right place to do it. So here's an idea: just trigger a purge from an AWS Lambda, when an S3 bucket is uploaded to! This seems like it should be relatively easy to pull off (Lambda triggers for S3 PUT requests is extremely common) and "set and forget" once it's working.

There are two methods to purge via Fastly's API:

  1. Simply send an HTTP request with method PURGE to some URL, for example, curl -X PURGE [<https://example.com/page.html>](<https://example.com/page.html>) and this can be unauthenticated (default) or authenticated.
  2. Use the actual API located at https://api.fastly.com like any other API usage.

We go with option 2 for scalability (see below), so we can do mass purge requests for many objects over time. Due to rate limiting concerns with the Fastly API, we need to take some special care for batching purge requests, using a tool we called the draining server.


Part 0: Setting object surrogate keys

Objects in the Fastly cache can have surrogate keys attached to them by VCL. A surrogate key is essentially a one-to-many mapping: it associates (potentially many) cache objects with a single key, which you can later purge with. Purging by surrogate key lets you purge many objects in a single call to the Fastly API. See more about Getting started with Surrogate Keys in the Fastly documentation.

Surrogate keys are useful for a number of reasons, especially due to rate limiting: we can "batch" many surrogate key purges into one API request, which is critical for the volume of objects we'll ingest. (We'll batch 256 surrogate keys into one request; see more on that below.)

The VCL in our cache configuration implicitly tags every object with several surrogate keys based on its type (nar file, narinfo, manifest, log file, etc), including a special key, which is the full URL to the object itself. Because cache objects are named with their derivation hash, this means every object has a single surrogate key that is unique only to it. Furthermore, because this unique surrogate key is simply the path to the object, we can easily recover the surrogate key in a stateless manner — just look at the object that was uploaded, and you have its unique key. There is no need to maintain any database or state that tracks keys, which is important in the next step.


Part 1: Lambda function

Next, now that we have surrogate keys (derived from the URL) set on the cache objects, we need to notify the draining server when something is uploaded, and give it the URL to purge. Here's an example of an AWS Lambda function that can be triggered on S3 PUT requests, and will make a POST request to the draining server, telling it that some cached thing needs to be expired.

let enabled = true;

// Host we contact for purge events
const purge_server = process.env.PURGE_HOST;
if (!purge_server) enabled = false;

// Key we use to contact the purging server
const purge_auth = process.env.PURGE_AUTH_KEY;
if (!purge_auth) enabled = false;

// success object
const success = (key, code) => ({
  status: 'success',
  msg: 'purge queued',
  obj: key,
  code: code,
});

// failure object
const failure = (key, code) => ({
  status: 'failed',
  msg: 'got non-202 response',
  obj: key,
  code: code,
});

// disabled object
const disabled = () => ({
  status: 'disabled',
  msg:    'not purging because function is disabled',
});

// invalid object
const invalid = (eventName) => ({
  status: 'invalid',
  msg:    'not purging for non-PUT event',
  event:  eventName,
});

// return a success or failure object given an HTTP response
const kontinue = (res, key) => res.statusCode === 202
  ? success(key, res.statusCode)
  : failure(key, res.statusCode)
  ;

// set up options for making a purge request
const purge = (server, auth, key) => ({
  host:   server,
  path:   '/api',
  port:   443,
  method: 'POST',
  auth:   auth,
  headers: {
    'Purge-Url': '/' + key,
  },
});

const https = require('https');
exports.handler = async (event, context) => {
  const info = event.Records[0];

  if (!enabled) return disabled();
  if (info.eventName !== 'ObjectCreated:Put') return invalid(info.eventName);

  return new Promise((resolve, reject) => {
    const options = purge(purge_server, purge_auth, info.s3.object.key);

    const req = https.request(options, (res) => {
      resolve(kontinue(res, options.headers['Purge-Url']));
    });

    req.on('error', (e) => { console.warn(e.message); reject(e.message); });
    req.end();
  });
};