Quorum: How Many Nodes Must Agree?

Series: System Design · Distributed Systems — Pillar 8 of 8
Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | Distributed Systems: What Happens When Machines Disagree | Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar. |
| 01 | Network Partitions: The Failure Mode You Can't Design Away | Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice. |
| 02 | Split-Brain: When Two Nodes Both Think They're the Leader | Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it. |
| 03 | Heartbeats: How Nodes Know Their Peers Are Alive | Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work. |
| 04 | Leader Election: Agreeing on Who's in Charge | Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve. |
| 05 | Consensus Algorithms: Agreeing on a Value Across Failures | Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used. |
| 06 | Quorum: How Many Nodes Must Agree? ← you are here | Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB. |
| 07 | Paxos: The Algorithm That Started It All | Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production. |
| 08 | Raft: Consensus Made Understandable | Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV. |
| 09 | Gossip Protocol: Decentralised Cluster Communication | Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production. |
| 10 | Logical Clocks: When Physical Time Isn't Enough | Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works. |
| 11 | Lamport Timestamps: Ordering Events Without a Global Clock | Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you. |
| 12 | Vector Clocks: Knowing When Events Are Truly Concurrent | Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations. |
| 13 | Distributed Transactions: When One Machine Isn't Enough | Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative. |
| 14 | Two-Phase Commit: Coordinating a Distributed Decision | 2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems. |
| 15 | Three-Phase Commit: Solving 2PC's Blocking Problem | 3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production. |
| 16 | Delivery Semantics: What Does "Delivered" Actually Mean? | Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate. |
| 17 | Change Data Capture: Streaming Your Database in Real Time | CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it. |
| 18 | Erasure Coding: Fault Tolerance Without Full Replication | Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication. |
| 19 | Merkle Trees: Efficiently Finding What's Different | Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy. |
| 20 | Observability: Understanding Your System at Runtime | Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together. |
| 21 | Distributed Systems: Wrap-Up | A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series. |
Quorum: How Many Nodes Must Agree?
The problem
Your Cassandra cluster has five nodes. You write a record with replication factor 3 (the record is stored on 3 of the 5 nodes). A network partition isolates one of those three replica nodes. Another replica is temporarily overloaded and slow.
How many replicas must acknowledge the write before you return success to the client? And how many replicas must agree on a read before you return the result?
If you wait for all three replicas: high consistency, but the write fails if any replica is unavailable. At three nines availability, that's about 8 hours of downtime per year per write operation.
If you wait for just one replica: high availability, but you might read from a replica that hasn't received the latest write. Data is inconsistent.
If you wait for two of three: a quorum. A majority. The write succeeds if any two replicas respond. Reads can check two replicas and return the latest value. And — critically — the read quorum and write quorum overlap: any two nodes that acknowledge a write will include at least one node that participates in any read of two nodes.
This overlap is the key insight behind quorum-based consistency.
The core idea
A quorum is the minimum number of nodes that must participate in an operation for it to be considered valid. By ensuring that the read quorum and write quorum overlap, a distributed system can guarantee that a read will always include at least one node that has the latest write — without requiring all replicas to be available.
The analogy: a jury that requires more than half
A criminal verdict requires the agreement of at least 7 of 12 jurors (more than half) in many jurisdictions. For an acquittal to be valid, it also requires majority agreement. This ensures that no valid conviction and valid acquittal can exist simultaneously — the two quorums (conviction and acquittal) must overlap, so a majority agreeing to one precludes a majority agreeing to the other.
In distributed systems: the write quorum and read quorum must overlap, ensuring every valid read includes at least one node that participated in the last valid write.
How quorum works
The R + W > N condition
Given:
- N = replication factor (number of replicas storing the data)
- W = write quorum (number of replicas that must acknowledge a write)
- R = read quorum (number of replicas that must respond to a read)
Consistency is guaranteed when R + W > N.
This ensures the read set and write set overlap by at least one node. That overlapping node has the latest written value.
N=3, W=2, R=2: R+W = 4 > 3 ✓ — consistent
N=3, W=1, R=3: R+W = 4 > 3 ✓ — consistent (but R=3 means all replicas must be available)
N=3, W=1, R=1: R+W = 2 ≤ 3 ✗ — not consistently guaranteed
N=5, W=3, R=3: R+W = 6 > 5 ✓ — consistent
Common quorum configurations
Quorum reads and writes (W = ⌊N/2⌋ + 1, R = ⌊N/2⌋ + 1): The most common choice. For N=3: W=2, R=2. R+W=4 > 3. Consistent. Tolerates 1 failure.
Write-heavy workload (W=1, R=N): Writes are fast (one ack). Reads require all replicas. Not suitable for availability — any replica failure makes reads fail.
Read-heavy workload (W=N, R=1): Writes require all replicas (durable, synchronous). Reads are fast. Writes fail if any replica is down. Better for read-heavy workloads where write durability is paramount.
Eventual consistency (W=1, R=1): Maximum availability. R+W=2 ≤ N=3. No consistency guarantee — reads may return stale data. Used when availability is paramount and staleness is acceptable (DNS, shopping carts, real-time counters).
Cassandra's consistency levels
Cassandra expresses quorum in named consistency levels per operation:
| Level | W or R count | Use case |
|---|---|---|
ONE |
1 | Maximum availability, eventual consistency |
QUORUM |
⌊N/2⌋ + 1 | Strong consistency, tolerates minority failures |
ALL |
N | Maximum durability, lowest availability |
LOCAL_QUORUM |
Majority in local DC | Multi-datacenter: consistency within DC |
EACH_QUORUM |
Majority in each DC | Multi-datacenter: consistency across DCs |
The URL shortener's click events use CONSISTENCY = ONE for writes (every write must succeed; durability comes from Cassandra's replication factor, not write acknowledgement) and LOCAL_QUORUM for reads that drive analytics dashboards.
Quorum in consensus vs quorum in databases
Consensus quorum (Raft, Paxos): a majority of the total cluster must acknowledge each log entry before it's committed. This is a strict quorum that cannot be adjusted. With 5 nodes, 3 must respond for every write.
Database quorum (Cassandra, DynamoDB): tunable per operation. The application can choose weak consistency for writes and strong consistency for reads, or vice versa. This flexibility is the leaderless database's key advantage over consensus-based systems.
Sloppy quorum
Cassandra and Dynamo use sloppy quorum during failures. If the nodes responsible for a key are unavailable, the write is temporarily accepted by other available nodes (hinted handoff). The write satisfies quorum numerically but not strictly on the designated replicas.
When the failed nodes recover, the hints are replayed — the temporarily stored data is sent to the correct nodes. This improves availability at the cost of brief inconsistency until hints are delivered.
Tradeoffs
Availability vs consistency. Increasing W or R improves consistency but reduces availability (more nodes must respond). Decreasing W or R improves availability but allows stale reads. The R+W>N formula is the mathematical boundary: cross it for consistency, fall below it for availability.
Latency. Quorum reads and writes must wait for the slowest of W (or R) responding nodes. ALL consistency is bounded by the slowest replica in the cluster. ONE returns as soon as the first replica responds. The availability/consistency tradeoff is also a latency tradeoff.
Stale reads at quorum. Even at QUORUM consistency, if the latest write hasn't propagated to the quorum before the read, the read may return a slightly older value. Read repair (Cassandra updates stale replicas after a quorum read detects version differences) mitigates but doesn't eliminate this.
The one thing to remember
Quorum consistency is guaranteed when R + W > N: the read quorum and write quorum overlap, ensuring every read includes at least one node that participated in the latest write. This is the mathematical basis for tunable consistency in distributed databases: increase W and R for stronger consistency (at the cost of availability); decrease them for higher availability (at the cost of possible stale reads). The right quorum settings depend entirely on the consistency requirements and availability targets of the specific operation.
← Previous: Consensus Algorithms — the broader category of algorithms (Paxos, Raft, Zab) that allow distributed nodes to agree on a value despite failures.
→ Next: Paxos — the foundational consensus algorithm, notoriously difficult to understand but the basis of many production systems.




