Series: System Design · Distributed Systems — Pillar 8 of 8

Systems Design

#	Post	What it covers
00	Distributed Systems: What Happens When Machines Disagree	Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar.
01	Network Partitions: The Failure Mode You Can't Design Away	Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice.
02	Split-Brain: When Two Nodes Both Think They're the Leader	Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it.
03	Heartbeats: How Nodes Know Their Peers Are Alive	Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work.
04	Leader Election: Agreeing on Who's in Charge	Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve.
05	Consensus Algorithms: Agreeing on a Value Across Failures	Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used.
06	Quorum: How Many Nodes Must Agree?	Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB.
07	Paxos: The Algorithm That Started It All	Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production.
08	Raft: Consensus Made Understandable	Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV.
09	Gossip Protocol: Decentralised Cluster Communication	Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production.
10	Logical Clocks: When Physical Time Isn't Enough	Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works.
11	Lamport Timestamps: Ordering Events Without a Global Clock	Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you.
12	Vector Clocks: Knowing When Events Are Truly Concurrent ← you are here	Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations.
13	Distributed Transactions: When One Machine Isn't Enough	Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative.
14	Two-Phase Commit: Coordinating a Distributed Decision	2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems.
15	Three-Phase Commit: Solving 2PC's Blocking Problem	3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production.
16	Delivery Semantics: What Does "Delivered" Actually Mean?	Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate.
17	Change Data Capture: Streaming Your Database in Real Time	CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it.
18	Erasure Coding: Fault Tolerance Without Full Replication	Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication.
19	Merkle Trees: Efficiently Finding What's Different	Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy.
20	Observability: Understanding Your System at Runtime	Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together.
21	Distributed Systems: Wrap-Up	A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series.

Vector Clocks: Knowing When Events Are Truly Concurrent

The problem

Lamport timestamps tell you: if A → B, then ts(A) < ts(B). What they don't tell you: if ts(A) < ts(B), does that mean A caused B, or were they concurrent?

Consider two users simultaneously editing the same link's destination URL on different servers. Lamport timestamps will order them (one will have a lower timestamp than the other), but this ordering is arbitrary — neither write caused the other. They're concurrent. Silently picking the "higher timestamp" write and discarding the other loses data without the application knowing a conflict occurred.

Vector clocks solve this: they can distinguish between "A happened-before B" and "A and B are concurrent and both happened independently." This distinction is the basis of conflict detection in distributed databases.

The core idea

A vector clock is a list of counters, one per process, that tracks each process's knowledge of every other process's logical time. Comparing two vector clocks tells you definitively: A happened-before B, B happened-before A, or A and B are concurrent.

The analogy: tracking who heard what from whom

Three gossips — Alice, Bob, and Carol — each keep a tally of how many things each person has told them:

Alice's tally: [Alice: 3, Bob: 2, Carol: 1] — "I've made 3 statements, heard 2 from Bob, 1 from Carol"
Bob's tally: [Alice: 2, Bob: 4, Carol: 2]

When Alice shares her tally with Bob, Bob updates each entry to the maximum of their two tallies. Bob now knows everything Alice knows.

If Bob's tally for a dimension is higher than Alice's for the same dimension, Bob has information Alice doesn't. If both are lower in some dimensions, their states are incomparable — they've heard different things and both have newer information in some respects. That's concurrency.

How vector clocks work

Each process i maintains a vector V of length N (one entry per process). V[j] = the number of events process i knows about from process j.

On a local event at process i: V[i] += 1

On sending a message from process i: V[i] += 1, send message with current V

On receiving a message at process i with vector W: V[j] = max(V[j], W[j]) for all j, then V[i] += 1

Comparing vector clocks

Vector clock A happened-before B (A → B) iff: A[i] ≤ B[i] for all i, and A[j] < B[j] for at least one j.

Vectors A and B are concurrent (A ‖ B) iff: neither A → B nor B → A. This means A has a higher counter in some dimension and B has a higher counter in another.

Process A    Process B    Process C
V_A=[0,0,0]  V_B=[0,0,0]  V_C=[0,0,0]

A has event: V_A=[1,0,0]
A sends to B with V=[1,0,0]

B receives:  V_B = max([0,0,0],[1,0,0]) + B++ = [1,1,0]
B has event: V_B=[1,2,0]

A has event (concurrent, no communication): V_A=[2,0,0]

Now compare V_A=[2,0,0] and V_B=[1,2,0]:
  V_A[0]=2 > V_B[0]=1 → A has something B doesn't
  V_A[1]=0 < V_B[1]=2 → B has something A doesn't
→ CONCURRENT: neither happened-before the other ✓

Application: Dynamo and conflict detection

Amazon Dynamo (and Riak, which is an open-source Dynamo implementation) uses version vectors (a variant of vector clocks) for conflict detection on writes.

When a client reads a key, Dynamo returns the value along with its version vector (a "context"). When the client writes back, it includes this context. Dynamo uses the context to determine if the write is a successor to the current value (no conflict) or concurrent with it (conflict).

Initial: key="x7Kp2", value="https://old.com", VC=[A:1, B:0]

Client 1 reads (gets VC=[A:1, B:0]), updates to "https://v2.com"
Writes with context VC=[A:1, B:0] → stored on Server A
  Server A: value="https://v2.com", VC=[A:2, B:0]

Client 2 (concurrently, read the old value) updates to "https://v3.com"
Writes with context VC=[A:1, B:0] → stored on Server B
  Server B: value="https://v3.com", VC=[A:1, B:1]

Reconciliation:
  [A:2, B:0] vs [A:1, B:1]: concurrent (A has A:2 > B:0, B has B:1 > A:0)
  → CONFLICT: surface both values to the application for resolution

In Dynamo, conflicting versions are returned to the next reader as a list ("siblings"). The application (or the client library) resolves the conflict and writes back a merged version.

The vector clock size problem

A vector clock has one entry per process. With 100 processes, each vector clock is 100 entries. At scale, this becomes expensive to store and transmit.

Dynamo addresses this with dotted version vectors (a more compact representation) or version vectors (track client IDs rather than server IDs). Riak and newer Dynamo implementations use these to avoid unbounded clock growth.

Vector clocks vs Lamport timestamps

	Lamport Timestamps	Vector Clocks
Detects A → B	Yes	Yes
Detects concurrency	No	Yes
Size	1 integer	N integers (one per process)
Use case	Total order, simple causality tracking	Conflict detection, concurrent write detection

The one thing to remember

Vector clocks track per-process knowledge so that comparing two events tells you definitively whether one happened-before the other or they're concurrent. A happened-before B means A's every counter ≤ B's corresponding counter (and at least one is strictly less). Concurrent means each has a higher counter in at least one dimension — neither subsumes the other. This is the mechanism behind conflict detection in Dynamo and Riak: concurrent writes are surfaced to the application for resolution, not silently discarded.

← Previous: Lamport Timestamps — the simplest logical clock: assigning monotonic integers to events to capture causal order.

→ Next: Distributed Transactions — ensuring atomicity across multiple nodes or services when a single ACID transaction isn't possible.

Vector Clocks: Knowing When Events Are Truly Concurrent

Systems Design

Vector Clocks: Knowing When Events Are Truly Concurrent

The problem

The core idea

The analogy: tracking who heard what from whom

How vector clocks work

Comparing vector clocks

Application: Dynamo and conflict detection

The vector clock size problem

Vector clocks vs Lamport timestamps

The one thing to remember

Comments

Systems Design

More from this blog

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Merkle Trees: Efficiently Finding What's Different

Erasure Coding: Fault Tolerance Without Full Replication

Change Data Capture: Streaming Your Database in Real Time

Command Palette

Systems Design

Vector Clocks: Knowing When Events Are Truly Concurrent

The problem

The core idea

The analogy: tracking who heard what from whom

How vector clocks work

Comparing vector clocks

Application: Dynamo and conflict detection

The vector clock size problem

Vector clocks vs Lamport timestamps

The one thing to remember

Comments

Systems Design

More from this blog