Foundations Series

#	Post	What it covers
00	Intro	What the Foundations pillar covers and why it matters
01	Availability	Uptime, the nines, and why 99% isn't good enough
02	Reliability	Correctness over time — when uptime isn't enough
03	Latency vs Throughput vs Bandwidth	The three numbers that define system performance
04	ACID vs BASE	Two philosophies for handling data under pressure
05	CAP Theorem ← you are here	The impossibility result every distributed system runs into
06	PACELC Theorem	What CAP doesn't tell you about latency
07	Consistency Models	The spectrum from "always correct" to "eventually correct"
08	Single Point of Failure	Why one weak link breaks the whole chain
09	High Availability vs Fault Tolerance	Similar goals, very different strategies
10	Wrap-up	How all nine concepts connect

CAP Theorem: The Impossibility Result Every Distributed System Hits

The problem

You've built a distributed database. Two nodes — call them Node A in Sydney and Node B in Singapore — both hold copies of your data and serve reads and writes. Most of the time, everything works: a write to Node A is replicated to Node B within milliseconds, and every client gets a consistent view of the world.

Then the network link between Sydney and Singapore goes down. Clients can still reach both nodes — they just can't talk to each other. A write comes in to Node A. Another write for the same record comes in to Node B. The nodes can't coordinate.

What does your system do?

Option one: both nodes stop accepting writes until the link recovers. Your system is consistent — no client will ever see conflicting data — but it's also unavailable to anyone trying to write.

Option two: both nodes keep accepting writes and reconcile when the link recovers. Your system stays available — clients can keep working — but two clients may now read different values for the same record.

There is no option three. This is CAP Theorem.

The core idea

CAP Theorem, proved by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, states that a distributed system can only guarantee two of the following three properties simultaneously:

Consistency — every read returns the most recent write, or an error
Availability — every request receives a response (not an error)
Partition tolerance — the system continues operating when network partitions occur

The critical nuance that most explanations skip: partition tolerance is not optional. Networks partition. This is not a theoretical edge case — it happens in production, between data centres, across availability zones, even within a single data centre during maintenance. A distributed system that cannot tolerate partitions is not a distributed system; it's a single node pretending to be one.

Which means the real choice in CAP is not "pick two of three." It's: when a partition occurs, do you sacrifice consistency or availability?

The analogy: two bank branches that lose their phone line

Two branches of the same bank share a ledger. Usually, they can call each other to confirm balances before processing withdrawals. Then the phone line goes down.

A customer walks into the Sydney branch and asks to withdraw £1,000. The teller can't verify with Singapore whether this customer has already withdrawn money there. Two choices:

Refuse the transaction until the phone line is restored — the branches stay consistent, but neither can serve customers requiring cross-branch verification. Consistent, unavailable.

Process the transaction anyway and reconcile when the line comes back — customers can be served, but for a brief window, both branches might believe the account has £1,000 in it and honour separate withdrawals. Available, inconsistent.

There is no third option that allows both branches to operate independently and guarantee that every customer sees the same balance. The partition forces the choice.

How it works

The three properties unpacked

Consistency (C) in CAP means linearisability — the strongest consistency model. Every read reflects the most recent write across all nodes. If you write a value and immediately read it (even from a different node), you get that value back. There is one globally agreed version of the truth at all times.

This is stricter than the "C" in ACID. ACID's consistency means constraints are enforced. CAP's consistency means all nodes agree on the current state.

Availability (A) means every request to a non-failing node returns a response — not an error, not a timeout, a real response. Note the qualifier: non-failing. A node that has crashed can return errors. But a healthy node must always respond.

Partition Tolerance (P) means the system continues operating when messages between nodes are lost or delayed. The system doesn't require all nodes to be connected to function.

CP systems: choose consistency over availability

When a partition occurs, a CP system stops accepting writes (or reads) on the partitioned side rather than risk returning inconsistent data. It sacrifices availability to preserve correctness.

In practice: HBase, Zookeeper, and Redis in certain configurations behave as CP systems. During a partition, the minority partition will refuse requests rather than return potentially stale data. The system is unavailable for some clients during the partition window, but when it comes back, everyone agrees on the state.

Right for: anything where inconsistent data causes more harm than an outage — financial ledgers, inventory systems, anything requiring strong coordination.

AP systems: choose availability over consistency

When a partition occurs, an AP system keeps all nodes responding, accepting that they may temporarily diverge. It sacrifices consistency to preserve availability.

In practice: CouchDB, Cassandra, and DynamoDB (in its default configuration) behave as AP systems. During a partition, all nodes keep responding. After the partition heals, the system reconciles — using strategies like last-write-wins, vector clocks, or application-level merge logic.

Right for: anything where staying available is more important than immediate correctness — social feeds, shopping carts, user preference data, anything where brief staleness is tolerable.

CA systems: a category that doesn't exist in distributed systems

You'll sometimes see databases described as "CA" — consistent and available, sacrificing partition tolerance. This is only coherent for a single-node system. The moment you have two nodes connected by a network, partitions are possible. You cannot choose to be partition-intolerant any more than you can choose for your network to never fail.

CA in practice means: "We're a single node with no replication." That's a valid architectural choice for some workloads. It's not a distributed system.

The partition is temporary — the choice is permanent

A key practical point: network partitions are usually brief. The Sydney–Singapore link goes down for 30 seconds, then recovers. During those 30 seconds your system had to choose CP or AP. After recovery, a CP system resumes normal operation and an AP system reconciles its diverged state.

The choice between CP and AP isn't just about those 30 seconds — it's about the reconciliation strategy you need to build for AP systems. Last-write-wins is simple but loses data. Operational transformation (used by Google Docs) is powerful but complex. Custom merge logic is flexible but requires careful design. Choosing AP is not just a switch you flip — it's a commitment to building and maintaining a conflict resolution strategy.

Common misreadings of CAP

"CAP means you pick two properties and ignore the third." No. You always need partition tolerance. The choice is C vs A during a partition, not across all three properties equally.

"My system is CP, so it's always consistent." A CP system is consistent during normal operation and during partitions. It achieves this by sacrificing availability when a partition occurs. If no partition is happening, a well-designed CP system is both consistent and available.

"CAP covers all distributed system tradeoffs." CAP is specifically about what happens during a network partition. It says nothing about performance, latency, or the tradeoffs present during normal operation. That's where PACELC (the next post) picks up.

"NoSQL = AP, SQL = CP." Roughly directional, often wrong. PostgreSQL with synchronous replication is CP. DynamoDB can be configured for strong consistency. CockroachDB and Google Spanner are distributed CP systems. The label on the box is less informative than the documentation on replication and consistency guarantees.

The tradeoffs

Choosing CP means designing for graceful degradation during partitions — users get clear errors rather than wrong data, and your on-call team knows the system is in a degraded state.

Choosing AP means designing for conflict resolution after partitions heal — you need a strategy for every data type that could diverge, tested before it matters in production at 3am.

Neither choice eliminates complexity. CP moves the complexity into availability engineering. AP moves it into reconciliation logic. The question is which kind of complexity your team is better equipped to handle.

The one thing to remember

CAP is not "pick two of three" — partition tolerance is mandatory in any real distributed system. The actual choice is simpler and harder: when your network fails and your nodes can't coordinate, do you want your system to stop responding, or to respond with potentially stale data? Pick the failure mode your use case can tolerate. Build the reconciliation or degradation strategy before you need it.

← Previous: ACID vs BASE — what a single database prioritises when things go wrong

→ Next: PACELC Theorem — CAP describes the tradeoff during a partition; PACELC asks the harder question: what are you trading off on a perfectly normal day when the network is healthy?

CAP Theorem: The Impossibility Result Every Distributed System Hits

Foundations Series

CAP Theorem: The Impossibility Result Every Distributed System Hits

The problem

The core idea

The analogy: two bank branches that lose their phone line

How it works

The three properties unpacked

CP systems: choose consistency over availability

AP systems: choose availability over consistency

CA systems: a category that doesn't exist in distributed systems

The partition is temporary — the choice is permanent

Common misreadings of CAP

The tradeoffs

The one thing to remember

Comments

Systems Design

PACELC Theorem: The Tradeoff CAP Doesn't Cover

More from this blog

Networking & Protocols: Wrap-Up

CDN: Moving Content Closer to the People Who Need It

Anycast Routing: One Address, Everywhere at Once

DNS Load Balancing: Traffic Distribution at the Name Layer

DNS: The Phone Book That Runs the Internet

Command Palette

Foundations Series

CAP Theorem: The Impossibility Result Every Distributed System Hits

The problem

The core idea

The analogy: two bank branches that lose their phone line

How it works

The three properties unpacked

CP systems: choose consistency over availability

AP systems: choose availability over consistency

CA systems: a category that doesn't exist in distributed systems

The partition is temporary — the choice is permanent

Common misreadings of CAP

The tradeoffs

The one thing to remember

Comments

Systems Design

PACELC Theorem: The Tradeoff CAP Doesn't Cover

More from this blog