Series: System Design · Data & Storage — Pillar 4 of 8

Systems Design

#	Post	What it covers
00	Data & Storage: Where Everything Lives	Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars)
01	SQL vs NoSQL: Choosing the Right Database	SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call.
02	Database Indexing: The Highest-Leverage Performance Tool	Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one.
03	B-Trees & B+ Trees: The Data Structure Behind Database Indexes	Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries.
04	LSM Trees: Why Some Databases Are Built for Writes	LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it.
05	Denormalisation: Trading Storage for Speed	Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely.
06	Database Sharding: Scaling Beyond a Single Node	Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces.
07	Data Partitioning: Choosing How to Divide Your Data	Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale.
08	Consistent Hashing: Minimising Resharding Pain ← you are here	Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems.
09	Replication & Read Replicas: Scaling Reads and Surviving Failures	Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them.
10	Object Storage: Unlimited Scale for Large Binary Data	Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it.
11	Block vs File vs Object Storage: Three Models, Three Use Cases	Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them.
12	Distributed File Systems: File Storage Across Many Machines	Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them.
13	Time Series Databases: Built for Metrics and Events	Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB.
14	Vector Databases: Semantic Search and AI Memory	Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one.
15	Full-Text Search Engines: Beyond SQL LIKE	Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful.
16	Materialized Views: Pre-Computing Expensive Queries	Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches.
17	Query Optimisation: From Slow to Fast	Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast.
18	Connection Pooling: Managing the Hidden Bottleneck	Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly.
19	Data & Storage: Wrap-Up	A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system.

Consistent Hashing: Minimising Resharding Pain

The problem

Your URL shortener's Redis cache uses hash-based routing: node = hash(link_id) % 3. Three cache nodes, perfectly distributed, working well.

Traffic doubles. You need a fourth cache node. You add it. Now the formula is hash(link_id) % 4. You do the math: for most values of link_id, hash(x) % 3 and hash(x) % 4 produce different results. Over 75% of all cached entries now map to a different node than where they're stored. The cache is effectively empty until it refills.

Every request hits the database while the cache warms up. Your database gets hammered. Response times spike. You've experienced a rehashing storm — the unavoidable consequence of modulo-based hash routing when the cluster size changes.

The same problem exists at larger scale with database shards: adding a shard to a three-node cluster means moving most of the data. On a petabyte-scale system, that migration takes days and carries significant risk.

Consistent hashing was designed to solve exactly this problem.

The core idea

Consistent hashing arranges both nodes and keys on a conceptual ring (the hash ring). Each key is assigned to the first node clockwise from its position on the ring. When a node is added or removed, only the keys between it and its predecessor on the ring are reassigned — typically 1/n of all keys, where n is the number of nodes.

The analogy: a clock face with restaurants

Imagine a circular clock face representing delivery zones. Five restaurants are placed at positions on the clock face. Each delivery address hashes to a position on the clock, and orders go to the first restaurant clockwise from that position.

A new restaurant opens and is placed between the 3 o'clock and 6 o'clock restaurants. Only delivery addresses that hash to positions between the 3 o'clock restaurant and the new restaurant change their assignment — they now go to the new restaurant instead of the 6 o'clock one. Every other delivery address is unchanged.

If the 6 o'clock restaurant closes, only its delivery addresses (the ones that were going there) move to the 9 o'clock restaurant. Everyone else's orders are unaffected.

This is the key property: cluster changes affect only the minimal set of keys necessary. Contrast with modulo hashing, where a single node addition causes a cascade that reassigns most keys.

How it works

Building the hash ring

Choose a large hash space — a ring of values from 0 to 2³² − 1 (or 2¹²⁸ for larger systems)
Hash each node's identifier (IP address, hostname) to a position on the ring
Each key is hashed to a position on the ring; its assigned node is the first node clockwise

Ring (simplified, positions 0–100):

Node A → position 15
Node B → position 45
Node C → position 72

key_x7Kp2 → hashes to position 35 → assigned to Node B (first node clockwise from 35)
key_pQ9m  → hashes to position 60 → assigned to Node C (first node clockwise from 60)
key_yz9A1 → hashes to position 80 → assigned to Node A (first node clockwise from 80, wrapping around)

Adding a node

New Node D is added at position 55:

Ring: ... Node B (45) ... Node D (55) ... Node C (72) ...

Keys that previously mapped to Node C and hashed to positions between 45 and 55 now map to Node D. Keys hashing to positions 55–72 still go to Node C. Everything else is unchanged.

Fraction of keys that move: approximately 1/n where n is the new total number of nodes. With 3 nodes, adding a 4th moves ~25% of keys. With 10 nodes, adding an 11th moves ~9% of keys.

Compare to modulo hashing: adding a 4th node to a 3-node cluster reassigns keys for ~75% of positions.

Removing a node

Node B (position 45) is removed. Keys that were assigned to Node B now assign to Node C (the next node clockwise). Everything else is unchanged. Only ~25% of keys move — exactly the keys that were on Node B.

Virtual nodes (vnodes)

Naive consistent hashing has a problem: if nodes hash to positions that aren't evenly spaced, some nodes get more of the key space than others. Node A at position 15 and Node B at position 16 means Node A handles almost the entire ring; Node B handles almost nothing.

Virtual nodes solve this by assigning each physical node multiple positions on the ring. Node A might have 150 virtual nodes distributed across the ring; Node B might have 150 more; Node C another 150. The key space is divided among 450 virtual nodes, each mapping to one of three physical nodes.

With enough virtual nodes, the key distribution is approximately uniform. When Node D is added, it takes some virtual nodes from each existing physical node — distributing the migration load rather than concentrating it on one node's neighbours.

Cassandra uses 256 virtual nodes per physical node by default. Amazon Dynamo (the paper that popularised consistent hashing in distributed systems) uses 100–200.

Physical node A → virtual nodes at positions: 3, 19, 42, 67, 91, ...
Physical node B → virtual nodes at positions: 11, 28, 55, 74, 88, ...
Physical node C → virtual nodes at positions: 7, 34, 61, 82, 95, ...

Replication

Consistent hashing integrates naturally with replication. Rather than assigning each key to one node, assign it to the next k nodes clockwise. For a replication factor of 3:

key_x7Kp2 at position 35:
- Primary replica → Node B (position 45)
- Second replica  → Node C (position 72)
- Third replica   → Node A (position 15, wrapping)

When Node B fails, Node C already has the data (as a replica) and can serve reads. This is how Cassandra, Dynamo, and Riak handle both data distribution and fault tolerance in a unified mechanism.

Real-world usage

Cassandra: consistent hashing is the foundation of Cassandra's data distribution. The partition key is hashed to a token position on the ring; virtual nodes ensure even distribution; the replication factor determines how many nodes hold each partition.

Caching layers: distributed caches (Memcached clusters, Redis Cluster) use consistent hashing for client-side routing so cache node additions don't invalidate the entire cache.

Load balancers: some load balancers use consistent hashing to route requests for the same session to the same backend server (sticky sessions), with minimal disruption when backend instances scale up or down.

Amazon DynamoDB: Dynamo's original paper described consistent hashing with virtual nodes as its core distribution mechanism, influencing a generation of distributed databases.

The tradeoffs

Uneven distribution without virtual nodes: naive consistent hashing can produce highly uneven load if physical nodes hash to clustered ring positions. Virtual nodes mitigate this but add complexity to the routing table.

Hot spots from skewed data: consistent hashing distributes keys uniformly by hash value, not by access frequency. A single extremely popular key (a viral link's click events) still all land on one node. Handling hot spots requires application-level strategies (local caching, read replicas for specific keys, key sharding with a suffix).

Routing complexity: clients (or a proxy layer) need to know the current ring state to route requests correctly. When nodes join or leave, the ring state must be updated atomically. Stale ring state routes requests to wrong nodes.

Not a universal solution: consistent hashing solves the resharding problem for key-value access patterns. It doesn't help with cross-shard range queries, cross-shard joins, or distributed transactions.

When to use consistent hashing

Consistent hashing is the right choice when:

You're distributing data across nodes and expect the cluster size to change (scale up/down)
Cache invalidation on node changes would be disruptive (cache warming is expensive or slow)
You're building a distributed key-value store, cache, or object store
The access pattern is key-based lookup (not range scans or joins)

It's less relevant when:

Your cluster is static and rarely changes
You're using a managed database service that handles distribution for you
Your access pattern is primarily range-based (consistent hashing doesn't preserve key ordering)

The one thing to remember

Consistent hashing places nodes and keys on a ring, assigning each key to the first node clockwise from its hash position. Adding or removing a node only redistributes the keys between it and its predecessor — roughly 1/n of all keys. Compare this to modulo hashing, where any cluster size change invalidates the vast majority of assignments. If your cluster changes size and you care about disruption, consistent hashing is the right tool.

← Previous: Data Partitioning: Choosing How to Divide Your Data — Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, m...

→ Next: Replication & Read Replicas: Scaling Reads and Surviving Failures — Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and...

Consistent Hashing: Minimising Resharding Pain

Systems Design

Consistent Hashing: Minimising Resharding Pain

The problem

The core idea

The analogy: a clock face with restaurants

How it works

Building the hash ring

Adding a node

Removing a node

Virtual nodes (vnodes)

Replication

Real-world usage

The tradeoffs

When to use consistent hashing

The one thing to remember

Comments

Systems Design

More from this blog

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Merkle Trees: Efficiently Finding What's Different

Erasure Coding: Fault Tolerance Without Full Replication

Change Data Capture: Streaming Your Database in Real Time

Command Palette

Systems Design

Consistent Hashing: Minimising Resharding Pain

The problem

The core idea

The analogy: a clock face with restaurants

How it works

Building the hash ring

Adding a node

Removing a node

Virtual nodes (vnodes)

Replication

Real-world usage

The tradeoffs

When to use consistent hashing

The one thing to remember

Comments

Systems Design

More from this blog