Series: System Design · Distributed Systems — Pillar 8 of 8

Systems Design

#	Post	What it covers
00	Distributed Systems: What Happens When Machines Disagree	Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar.
01	Network Partitions: The Failure Mode You Can't Design Away	Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice.
02	Split-Brain: When Two Nodes Both Think They're the Leader	Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it.
03	Heartbeats: How Nodes Know Their Peers Are Alive	Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work.
04	Leader Election: Agreeing on Who's in Charge	Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve.
05	Consensus Algorithms: Agreeing on a Value Across Failures	Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used.
06	Quorum: How Many Nodes Must Agree?	Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB.
07	Paxos: The Algorithm That Started It All	Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production.
08	Raft: Consensus Made Understandable	Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV.
09	Gossip Protocol: Decentralised Cluster Communication	Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production.
10	Logical Clocks: When Physical Time Isn't Enough ← you are here	Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works.
11	Lamport Timestamps: Ordering Events Without a Global Clock	Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you.
12	Vector Clocks: Knowing When Events Are Truly Concurrent	Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations.
13	Distributed Transactions: When One Machine Isn't Enough	Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative.
14	Two-Phase Commit: Coordinating a Distributed Decision	2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems.
15	Three-Phase Commit: Solving 2PC's Blocking Problem	3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production.
16	Delivery Semantics: What Does "Delivered" Actually Mean?	Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate.
17	Change Data Capture: Streaming Your Database in Real Time	CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it.
18	Erasure Coding: Fault Tolerance Without Full Replication	Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication.
19	Merkle Trees: Efficiently Finding What's Different	Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy.
20	Observability: Understanding Your System at Runtime	Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together.
21	Distributed Systems: Wrap-Up	A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series.

Logical Clocks: When Physical Time Isn't Enough

The problem

Your URL shortener's Analytics Service runs on three servers. Each server has a real-time clock. Server A says it's 14:00:00.000. Server B says it's 14:00:00.003. Server C says it's 13:59:59.997.

These three clocks disagree by a few milliseconds. That seems fine — until you need to order events:

Server A records a click at 14:00:00.001
Server C records the same link's metadata update at 13:59:59.998

Based on physical timestamps, the metadata update (C, 13:59:59.998) happened before the click (A, 14:00:00.001). But in reality, the metadata update was triggered by a notification that included information from the click — the click caused the update, not the other way around.

Physical clocks in distributed systems drift. NTP can synchronise them to within a few milliseconds, but "a few milliseconds" matters when events happen microseconds apart on different machines. You cannot rely on physical timestamps to establish causal order between events on different machines.

This is the problem logical clocks solve.

The core idea

A logical clock doesn't measure real time — it measures causal order: whether one event happened-before another. The fundamental insight from Leslie Lamport's 1978 paper: if event A caused event B (A's message was received by the process that executed B), then A happened-before B. If A and B are on different machines with no causal connection, they are concurrent — neither happened before the other, regardless of their physical timestamps.

The analogy: chapter numbers in a book exchange

Two authors are collaborating on a book, each writing different chapters and emailing drafts to each other. The emails take random amounts of time to arrive.

Author A writes Chapter 3, then emails it to Author B. Author B reads it and responds with Chapter 4, which incorporates Chapter 3's ideas. A simple fact: Chapter 4 happened-after Chapter 3 — not because the clock says so, but because Chapter 4 causally depends on Chapter 3.

If Author C (who wasn't in the exchange) writes Chapter 5 independently at the same time, Chapter 5 is concurrent with Chapter 3 and Chapter 4 — neither happened-before the other in the causal sense.

Logical clocks are chapter numbers: they track the causal chain of revisions, not wall-clock time.

The happened-before relation

Lamport defined the happened-before relation (→) with three rules:

Within one process: if event A occurs before event B in the same process, then A → B
Message passing: if A is the sending of a message and B is the receipt of that message, then A → B
Transitivity: if A → B and B → C, then A → C

Events that are neither A → B nor B → A are concurrent (written A ‖ B).

This relation is a partial order — not every pair of events is ordered. Concurrent events have no meaningful "before/after" relationship.

Why this matters

Conflict resolution: in Cassandra, if two writes to the same key are concurrent (neither happened-before the other), it's a conflict. Last-write-wins (using timestamps) may silently discard one. Vector clocks can detect this and surface the conflict for application-level resolution.

Debugging distributed systems: reconstructing the causal chain of events in a distributed incident requires happened-before order, not just physical timestamps. "This cache miss happened before that database query" is a causal statement, not a temporal one.

Consistency in replicated systems: when a client reads its own writes, the system must guarantee it sees events that happened-after its writes — not just events that have a later physical timestamp.

Partial vs total order

Physical clocks give a total order: every event has a wall-clock time, and times can always be compared. The problem: the comparison is unreliable (clocks drift) and doesn't reflect causality.

Happened-before gives a partial order: some events are ordered (causally related), others are concurrent (no causal relationship). More accurate but doesn't order everything.

Logical clock implementations (Lamport timestamps, vector clocks) bridge the gap:

Lamport timestamps: extend partial order to total order by breaking ties arbitrarily. Simple, but can't detect concurrency.
Vector clocks: preserve partial order, detect concurrency explicitly. More complex, but more informative.

The one thing to remember

Physical clocks can't reliably establish causal order in distributed systems — they drift, and milliseconds of disagreement matter. Logical clocks track causal relationships instead: if A caused B (via message passing or within a process), A happened-before B. This partial order is what distributed systems actually need for correctness: detecting conflicts, establishing read-your-writes guarantees, and reconstructing event causality in debugging. Lamport timestamps and vector clocks (the next two posts) are the concrete implementations of this idea.

← Previous: Gossip Protocol — decentralised cluster membership and state propagation without a leader or consensus.

→ Next: Lamport Timestamps — the simplest logical clock: assigning monotonic integers to events to capture causal order.

Logical Clocks: When Physical Time Isn't Enough

Systems Design

Logical Clocks: When Physical Time Isn't Enough

The problem

The core idea

The analogy: chapter numbers in a book exchange

The happened-before relation

Why this matters

Partial vs total order

The one thing to remember

Comments

Systems Design

More from this blog

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Merkle Trees: Efficiently Finding What's Different

Erasure Coding: Fault Tolerance Without Full Replication

Change Data Capture: Streaming Your Database in Real Time

Command Palette

Systems Design

Logical Clocks: When Physical Time Isn't Enough

The problem

The core idea

The analogy: chapter numbers in a book exchange

The happened-before relation

Why this matters

Partial vs total order

The one thing to remember

Comments

Systems Design

More from this blog