Series: System Design · Distributed Systems — Pillar 8 of 8

Systems Design

#	Post	What it covers
00	Distributed Systems: What Happens When Machines Disagree	Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar.
01	Network Partitions: The Failure Mode You Can't Design Away	Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice.
02	Split-Brain: When Two Nodes Both Think They're the Leader	Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it.
03	Heartbeats: How Nodes Know Their Peers Are Alive	Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work.
04	Leader Election: Agreeing on Who's in Charge	Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve.
05	Consensus Algorithms: Agreeing on a Value Across Failures	Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used.
06	Quorum: How Many Nodes Must Agree?	Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB.
07	Paxos: The Algorithm That Started It All	Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production.
08	Raft: Consensus Made Understandable	Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV.
09	Gossip Protocol: Decentralised Cluster Communication	Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production.
10	Logical Clocks: When Physical Time Isn't Enough	Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works.
11	Lamport Timestamps: Ordering Events Without a Global Clock	Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you.
12	Vector Clocks: Knowing When Events Are Truly Concurrent	Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations.
13	Distributed Transactions: When One Machine Isn't Enough	Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative.
14	Two-Phase Commit: Coordinating a Distributed Decision	2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems.
15	Three-Phase Commit: Solving 2PC's Blocking Problem	3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production.
16	Delivery Semantics: What Does "Delivered" Actually Mean?	Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate.
17	Change Data Capture: Streaming Your Database in Real Time ← you are here	CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it.
18	Erasure Coding: Fault Tolerance Without Full Replication	Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication.
19	Merkle Trees: Efficiently Finding What's Different	Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy.
20	Observability: Understanding Your System at Runtime	Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together.
21	Distributed Systems: Wrap-Up	A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series.

Change Data Capture: Streaming Your Database in Real Time

The problem

Your URL shortener's links table in PostgreSQL is the source of truth. Multiple downstream systems need to stay current:

Elasticsearch must index new and updated links for search
The analytics data warehouse must receive link metadata changes
The Outbox pattern relay must publish events from the outbox table
A Redis cache must be invalidated when link destinations change

The naive approach: the application that writes to PostgreSQL also writes to each downstream system. This is dual-write — we've seen its problems (post 10, Outbox pattern). The write to PostgreSQL and the writes to downstream systems are not atomic; if any downstream write fails, systems diverge.

The better approach: instead of the application writing to multiple places, let the database itself be the single source of truth and stream every change to an event bus. Downstream systems consume from the event bus. This is Change Data Capture.

The core idea

Change Data Capture (CDC) reads the database's internal change log (the Write-Ahead Log for PostgreSQL, the binary log for MySQL) and streams every INSERT, UPDATE, and DELETE as an event to downstream consumers. The application writes only to the database. The CDC pipeline handles the fan-out.

The analogy: a stenographer recording every decision

A committee meeting makes decisions verbally. Rather than have the chairperson personally notify every stakeholder, a stenographer records every statement in real time. Stakeholders who need to know about decisions subscribe to the stenographer's transcript.

The committee (application) speaks once. The stenographer (CDC) captures it all. Stakeholders (downstream systems) get the full record without the committee knowing they exist.

How CDC works

The Write-Ahead Log (WAL)

PostgreSQL (and most relational databases) maintain a WAL — a sequential, append-only log of every change committed to the database. The WAL's primary purpose is durability: if the database crashes, it replays the WAL on recovery.

CDC reads the WAL and converts its binary records into structured change events:

WAL entry: INSERT into links (id='x7Kp2', dest='https://example.com', user_id=123)
CDC event:
  {
    "op": "c",              // c=create, u=update, d=delete
    "source": {"table": "links", "lsn": 291840},
    "before": null,         // no previous state on insert
    "after": {
      "id": "x7Kp2",
      "dest": "https://example.com",
      "user_id": 123,
      "created_at": "2025-06-01T14:00:00Z"
    }
  }

For an UPDATE, both before and after are populated. For a DELETE, before is the deleted row, after is null.

Debezium

Debezium is the dominant open-source CDC platform. It runs as a Kafka Connect connector: it reads the PostgreSQL WAL (or MySQL binlog, MongoDB oplog, etc.) and publishes change events to Kafka topics.

PostgreSQL WAL
  → Debezium PostgreSQL connector (Kafka Connect)
  → Kafka topic: postgres.public.links
  → Consumers:
      Elasticsearch connector (indexes new/changed links)
      Analytics connector (writes to data warehouse)
      Outbox relay connector (publishes Outbox messages)
      Cache invalidation service (deletes Redis keys on change)

Logical replication slots (PostgreSQL)

For CDC to work, PostgreSQL must preserve WAL entries until the CDC consumer has read them. This is done via a logical replication slot: a named cursor in the WAL that tracks how far the CDC consumer has read. PostgreSQL retains WAL up to the oldest slot's position.

-- Create a replication slot for Debezium
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

Important: an unused replication slot causes WAL to accumulate indefinitely — PostgreSQL won't clean it up. A stalled CDC pipeline with a replication slot can fill the database's disk. Monitor replication slot lag.

What CDC enables

Real-time search index updates. Instead of a nightly Elasticsearch reindex, every link INSERT or UPDATE propagates to Elasticsearch within seconds. Search results reflect the latest data.

Cache invalidation. When a link's destination is updated, a CDC consumer deletes the corresponding Redis cache entry. The next redirect hits the database for the fresh value — no TTL wait for stale data.

Event sourcing without application changes. The database's change log becomes an event stream. Downstream systems can replay from any point in history.

Data warehouse synchronisation. The ELT pipeline (Pillar 7, post 18) can use CDC for near-real-time data warehouse updates instead of nightly batch ETL jobs.

Outbox pattern relay. The most reliable Outbox implementation uses CDC: Debezium watches the outbox table and publishes messages to Kafka as rows are inserted. No polling loop, no additional database queries.

Tradeoffs

WAL format coupling. Debezium reads the WAL directly. Major schema changes (dropping a column, changing a column type) can break the CDC pipeline if the connector's schema configuration isn't updated simultaneously.

Replication slot lag. A slow or stalled CDC consumer causes WAL retention to grow. PostgreSQL must keep WAL segments for the stalled consumer, potentially filling disk. Monitor pg_replication_slots and alert on lag.

Ordering and partitioning. Debezium publishes events in WAL order within a table, but across tables, ordering is not guaranteed. For consumers that depend on cross-table ordering (applying a parent row insert before a child row insert), careful partitioning of Kafka topics is required.

Not for all databases. CDC via WAL requires the database to support logical replication (PostgreSQL 10+, MySQL 5.5+ binlog, MongoDB 3.6+ oplog). Older databases or some cloud-managed databases may not expose the WAL for CDC.

The one thing to remember

CDC turns your database into an event source by reading its internal change log — the WAL — and streaming every commit as a structured event. The application writes only to the database; CDC handles fan-out to all downstream consumers (search, cache, analytics, other services). This solves the dual-write problem: the database write and the downstream event are guaranteed to be consistent because the event is derived from the already-committed WAL, not written separately. Debezium is the standard tool for this in PostgreSQL/MySQL/MongoDB environments.

← Previous: Delivery Semantics — at-most-once, at-least-once, and exactly-once: what each guarantees and what it costs.

→ Next: Erasure Coding — storing data redundantly using mathematics rather than full replication.

Change Data Capture: Streaming Your Database in Real Time

Systems Design

Change Data Capture: Streaming Your Database in Real Time

The problem

The core idea

The analogy: a stenographer recording every decision

How CDC works

The Write-Ahead Log (WAL)

Debezium

Logical replication slots (PostgreSQL)

What CDC enables

Tradeoffs

The one thing to remember

Comments

Systems Design

More from this blog

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Merkle Trees: Efficiently Finding What's Different

Erasure Coding: Fault Tolerance Without Full Replication

Command Palette

Systems Design

Change Data Capture: Streaming Your Database in Real Time

The problem

The core idea

The analogy: a stenographer recording every decision

How CDC works

The Write-Ahead Log (WAL)

Debezium

Logical replication slots (PostgreSQL)

What CDC enables

Tradeoffs

The one thing to remember

Comments

Systems Design

More from this blog