Caching — System Design Primer

Why Cache?

Caching stores frequently accessed data in a faster storage layer so subsequent reads skip the slow path entirely. A database query that takes 50ms can return in under 1ms from cache. At scale, this difference is the difference between a responsive app and one that collapses under load.

The 80/20 rule applies heavily to caching. Most applications have a highly skewed access pattern — a small percentage of data accounts for a large percentage of reads. Caching that hot subset gives you enormous returns with minimal memory.

Cache Layers

Caching isn't a single layer — it happens at every level of the stack. Each layer serves a different purpose:

Client Cache (browser, app)

↓

CDN Edge Cache

↓

Reverse Proxy Cache (Nginx/Varnish)

↓

Application Cache (Redis/Memcached)

↓

Database Query Cache

↓

Database (source of truth)

Each layer intercepts reads before they reach the database.

Client cache: Browser caches static assets via Cache-Control headers. The fastest cache — the request never leaves the device.
CDN cache: Edge servers cache content geographically close to users. Great for static assets and cacheable API responses.
Reverse proxy cache: Nginx, Varnish, or similar. Caches full HTTP responses before they hit your application servers.
Application cache: Redis or Memcached sitting between your app and database. You control exactly what gets cached, with what key, and for how long.
Database cache: Most databases have built-in query caches and buffer pools. Useful but not always tunable.

Cache Strategies

Cache-Aside (Lazy Loading)

The application checks the cache first. On a miss, it reads from the database, writes the result to cache, and returns it. The cache only stores data that's actually been requested.

App

1. GET key →

Cache

(miss)

App

2. SELECT →

Database

App

3. SET key →

Cache

Cache-aside: app manages cache reads and writes explicitly.

Pros: Only requested data gets cached. Cache failures don't break reads (you fall back to DB). Cons: First request is always a cache miss. Data can become stale if the DB is updated without invalidating cache.

Write-Through

Every write goes to both cache and database synchronously. The cache always has current data.

Pros: Cache is never stale. Cons: Write latency increases (two writes per operation). You also cache data that might never be read.

Write-Behind (Write-Back)

Writes go to the cache first, and the cache asynchronously flushes to the database in batches. The application only waits for the cache write.

Pros: Very fast writes. Batching reduces database load. Cons: Risk of data loss if the cache crashes before flushing. More complex to implement correctly.

Refresh-Ahead

The cache proactively refreshes entries before they expire, based on predicted access patterns. If an entry is accessed frequently and its TTL is approaching, the cache fetches a fresh copy in the background.

Pros: No cache miss latency for hot data. Cons: Wasted refreshes if access patterns change. Requires good prediction logic.

Strategy	Read latency	Write latency	Consistency
Cache-aside	Miss on first read	Normal	Can go stale
Write-through	Always fast	Slower (double write)	Always current
Write-behind	Always fast	Very fast	Risk of loss
Refresh-ahead	Always fast (if predicted)	Normal	Proactively fresh

Cache Invalidation

"There are only two hard things in computer science: cache invalidation and naming things." — Phil Karlton

When the underlying data changes, the cache needs to know. There are three main approaches:

TTL (Time to Live): Entries expire after a fixed duration. Simple and safe, but data can be stale for up to the TTL window.
Event-driven invalidation: When data changes, explicitly delete or update the cached entry. Consistent, but requires plumbing between writes and cache.
Version keys: Include a version number in the cache key (e.g., user:42:v3). Incrementing the version effectively invalidates old entries without explicit deletion.

Stale cache + thundering herd = outage. When a popular cache entry expires, hundreds of concurrent requests all hit the database simultaneously. Protect against this with cache stampede prevention: use a lock so only one request fetches from DB while others wait for the cache to be repopulated.

Eviction Policies

Caches have finite memory. When they're full, something has to go. The eviction policy decides what:

LRU (Least Recently Used): Evicts the entry that hasn't been accessed for the longest time. The default choice for most caches.
LFU (Least Frequently Used): Evicts the entry with the fewest total accesses. Better for workloads with stable hot sets.
FIFO (First In, First Out): Evicts the oldest entry regardless of access pattern. Simple but often suboptimal.
Random: Evicts a random entry. Surprisingly effective and very cheap to implement.