Four rules for lightning-fast content delivery during traffic peaks

Here are four principles we apply at Contentful to maintain a high-performance content delivery API, and make good on the services our customers rely upon.

Published

November 23, 2023

Author

1. Keep responses cacheable

A major reason why our customers benefit from such incredibly low response times is because we understand their requirements for content delivery and have built accordingly.

We know that most of our customers don’t need to update their content very often, or only a small proportion of their overall repository is regularly refreshed. It’s far more efficient to serve content from a stored cache, which is how our customers benefit from really fast responses. Our availability, in effect, becomes the availability of the CDN.

At Contentful, one of the CDN providers we use offers a global distributed PoP (Points of Presence) network, fast cache purging, and very flexible configuration. Building on top of that platform, we have worked on making it possible to purge the cache in targeted ways, on providing the ability to serve stale content where it makes sense, on correctly caching certain kinds of API errors, and also developed the ability to enact a rolling cache purge of the CDN without overloading our systems.

It could be argued that the best kind of origin request — the request made to our platform when there is no up-to-date cached copy — is not to have to make an origin request at all. Depending on the customer, our ratio of cached content to new content rarely goes below percentages in the mid-90s. More than nine requests out of every 10 are being served from cache across all customers. So, for example, if we're doing 35,000 requests per second, that's significantly fewer requests hitting our platform.

When working on innovating the content delivery path, then, the key consideration when adding new functionality is to explore how best to implement caching and expiry. Which brings us to…

2. Be ready for the “thundering herd”

The second principle is about the thundering herd problem; this is what happens when you get an avalanche of requests coming to the origin servers. When you purge the cache, for example, you get a ton of requests. This might be happening because you’re publishing large chunks of content in preparation for a campaign, or you’re updating live during a seasonal shopping event like Black Friday or Cyber Monday. In other situations, it might be simply a piece of content that goes viral, or a new product announcement that is released and generates a lot of interest.

Contentful is more than capable of handling these traffic spikes — the moments when our customers rely on us most — at the busiest and most critical times of the year. With our CDN provider, we benefit from features like clustering, or cache sharing in a PoP. We also have request collapsing and shielding. All of these reduce the pressure on the origin server, so the origin server has to do less work.

Requests that make it to our origin servers are handled by our Kubernetes platform in AWS. We use the highly scalable application load balancers from AWS as our traffic ingress. Our applications run on elastic Kubernetes infrastructure that auto-scales to quickly handle extra traffic, and the applications themselves also scale horizontally, adding or removing replicas based on the current load. We then prepare for key events and busy times of the year by adding extra capacity in preparation where necessary.

3. Choose your dependencies carefully

The third principle is managing your dependencies, and with this term I’m not referring to a JavaScript library, but other systems that your application might make use of — internal microservices, databases, or external APIs. This principle is about service level agreements, and if you’re using third-party services as part of your offering, you're only as reliable as your least reliable component. That's how the dependency calculus works. It’s that much harder to be lean and mean when you are consuming from less lean sources.

Just like any other business, we offer a service level agreement (SLA) to paying customers which is specific to their plan. These SLAs will include guarantees around uptime, among other things. Internally, we define Service Tiers for services depending on the availability targets of the SLAs that they underpin; a Tier 1 service, such as our Delivery API, must not have a dependency that has a lower availability guarantee, be it internal or external — if that were to happen, we’d be exposing ourselves. Using a non-HA (highly available) managed cloud database instance or other third-party service might seem inconsequential, but if they only offer a 99% SLA, there will be consequences if they go down.

So to maintain our lean content delivery path, we’re always asking ourselves whether we really need the extra dependency. Is it really necessary? Could the data it provides perhaps be pre-computed and stored in a low-latency, cached service instead?

4. Build for failure

In distributed systems, it is good practice to assume that calls made across the network between systems can and will fail in various ways — connections can be refused, responses may take much longer than expected or fail to arrive at all. Our systems on the content delivery path employ the established best practices to mitigate this, for example: retries, timeouts, failing open when possible on feature flags, and circuit breakers.

Managing failure domains is also very important. In systems architecture speak, this is the term for a “slice” or section of computing infrastructure that can be negatively impacted by a given type of fault. For example, AWS Regions (e.g., the venerable Northern Virginia “us-east-1” Region) are composed of multiple Availability Zones (AZ), each consisting of a group of computing resources that could potentially fail at the same time due to a power cut, bad code deployment, or networking issue.

Therefore, it is a good practice to build computing infrastructure that spans at least two AZs to ensure your platform stays up should an entire AZ fail. At Contentful, we deploy our Kubernetes clusters across at least three AZs, and database clusters across at least two AZs.

Kubernetes clusters and database clusters are themselves a failure domain impacting all customers whose traffic goes through them or whose data they store. Therefore we split customer data and traffic across multiple instances of each in a “shared-nothing” architecture, ensuring that issues with individual clusters are contained and have a limited impact, and we have the option of routing traffic around a misbehaving cluster. On the content delivery path especially, we strive to keep all the necessary service dependencies local to the cluster dealing with the API request. We also set rate limits per space, so there's a limit to how much each customer application can load the system.

However, what if the entire AWS Region is down? For customers with the highest availability requirements, we replicate data and services in a secondary AWS Region that can handle Delivery API and GraphQL API requests, allowing our CDN layer to route 100% of requests during the outage.

The key here is to identify failure domains and avoid crossing their boundaries — avoid putting all your eggs in one basket.

Wrapping up

So, the takeaway is to keep these four principles in mind when working in the content delivery path. Keeping responses cacheable, be ready for the thundering herd, choose your dependencies carefully, and build for failure in multiple regions. To sum up, the content delivery path has to be the leanest and meanest. These concepts have enabled Contentful to keep systems performant and available for our customers through high-trafficked events like Black Friday, Cyber Monday, and beyond.