Distributed Caching & Eviction Policies

Why a database that handled 10K requests a minute falls over at 10:01, and how cache write policies, eviction strategies, and distributed caching failure modes actually get chosen in production.

You're on-call. It's launch day for a flash sale, and at 10:00am your monitoring dashboard goes from green to a wall of red in about ninety seconds. Database CPU is pinned at 100%. Response times went from 40ms to 8 seconds. Support tickets are piling up. Nobody added more traffic than expected — the database was just never built to answer the same "what's the price of this product" question 50,000 times a second.

This is the problem caching solves. Not "make things faster" in the abstract — specifically, stop asking the database the same question over and over when the answer hasn't changed.

Why Caching Matters

Here's the part that surprises people who haven't had to debug this: disk-backed reads (even on a well-tuned Postgres instance) usually cost single-digit milliseconds. Reading the same value from memory costs sub-millisecond. That gap doesn't sound huge until you multiply it by tens of thousands of requests per second — at that point, the database isn't slow because of bad queries, it's slow because it's doing redundant work that a cache layer could have absorbed.

[ Client ] ---> [ Application Server ] ---> [ Cache (Redis) ]
                                                   | (Cache Miss)
                                                   v
                                            [ Database (PostgreSQL) ]

Once you accept that you need a cache, the next question isn't "should I cache" — it's "how do I keep the cache honest." That's where write policies come in.

Cache Write Policies

Every write policy is really answering one question: when data changes, who updates the cache, and when? Get this wrong and you'll serve stale prices, stale inventory counts, or stale account balances — which is a much worse failure than being slow.

1. Cache-Aside (Lazy Loading)

The application owns both reads and writes explicitly.

Read Path: The app queries the cache. On a cache miss, it queries the database, writes the result to the cache, and returns it.
Write Path: The app updates the database directly and invalidates (deletes) the cache entry.
Pros: Resilient to cache failures — if Redis disappears, you fall back to the database and keep serving traffic. Only requested data ever gets cached.
Cons: The first request for any key is always a miss, and that miss costs a database round-trip plus a cache write.

This is the default choice for most teams, and for good reason — it fails safely. If your cache layer goes down at 3am, cache-aside degrades to "slow" instead of "broken."

2. Write-Through

The cache becomes the interface your application writes to; the cache is responsible for keeping the database in sync.

Write Path: The app writes to the cache. The cache synchronously updates the database.
Pros: The cache is never stale — a read right after a write always sees the new value.
Cons: Every write now pays for two systems to acknowledge it, so write latency goes up.

Use this when correctness matters more than write speed — session state, feature flags, anything where a stale read causes a real bug.

3. Write-Back (Write-Behind)

Same idea as write-through, except the database write happens asynchronously.

Write Path: The app writes to the cache. The cache queues the write and flushes it to the database later.
Pros: Writes feel instant, which matters for write-heavy workloads like counters or activity logs.
Cons: If the cache crashes before the queued write is flushed, that data is gone. This is the trade-off you're explicitly signing up for.

I've seen teams pick write-back for view counts and like counts — losing a few seconds of counter increments during a crash is an acceptable trade for the write throughput it buys. Nobody would make that same trade for payment records.

Cache Eviction Policies

Memory isn't free, so eventually the cache fills up and has to decide what to throw away. This decision matters more than it sounds — pick the wrong policy and you end up with cache thrashing, where the same items get evicted and immediately re-fetched, and the cache stops helping at all.

,[object Object],

Eviction Policy	Description	Best For
LRU (Least Recently Used)	Evicts the item that has not been accessed for the longest time.	General-purpose workloads with temporal locality.
LFU (Least Frequently Used)	Evicts the item with the lowest access count.	Static files or search results with stable popularity.
FIFO (First In First Out)	Evicts the oldest item based on insertion time.	Simple queues or time-sequenced batch data.

LRU Cache Implementation Concept

An LRU cache is typically implemented using a combination of a Hash Map (for $O(1)$ lookup) and a Doubly Linked List (for $O(1)$ updates to the access order).

// A conceptual LRU update workflow
function get(key) {
  if (!map.has(key)) return null;
  const node = map.get(key);
  moveToHead(node); // Mark as most recently used
  return node.value;
}

Distributed Caching Challenges

When scaling horizontally across multiple machines, local memory caches (in-memory maps) become problematic because each server has a different view of the data.

To solve this, we use a distributed caching cluster like Redis or Memcached.

1. Cache Avalanche

Occurs when a large portion of the cache expires simultaneously, causing a sudden surge of traffic to hit the database.

Solution: Add a random jitter to the TTL (Time-To-Live) values so expirations are spread out.

2. Cache Stampede (Dogpiling)

Occurs when a popular cached item expires, and multiple parallel requests attempt to recalculate the same item and write it to the database at the same time.

Solution: Use mutual exclusion locks (mutex) to allow only one thread/process to update the cache on a miss.

3. Cache Penetration

Occurs when requests query keys that do not exist in the database (e.g., malicious requests for non-existent IDs, or a bug that generates random IDs). Since the key is never found in the cache or the database, every single one of those requests hits the database directly — the cache provides zero protection.

Solution: Cache empty or null values with a short TTL, or use a Bloom Filter to check if the key could possibly exist before querying the database at all.

Key Takeaways

Caching exists to eliminate redundant work, not just "make things faster" — the database was slow because it kept re-answering the same question, not because the query itself was inefficient.
Cache-aside is the safest default because it fails open: if the cache disappears, you degrade to "slow" instead of "wrong" or "down."
Write-through and write-back trade write latency against data-loss risk — the right choice depends on whether a stale or lost write is acceptable for that specific data (a like count vs. a payment record are not the same decision).
Cache avalanche, stampede, and penetration are three distinct failure modes with three distinct fixes: TTL jitter, mutex locking, and empty-value caching (or Bloom filters) respectively. Diagnosing which one you're hitting is the first step to fixing it.

Next Steps

Caching solves the "don't ask twice" problem for reads. But once you have multiple services that need to react to the same event — an order being placed, a payment clearing — polling a cache doesn't scale. That's where event streaming systems like Kafka come in, which is the next chapter.