Skip to main content

Command Palette

Search for a command to run...

The Outbox Pattern: Atomic Writes and Event Publishing

Updated
10 min read
The Outbox Pattern: Atomic Writes and Event Publishing

Series: System Design · Architecture Patterns — Pillar 7 of 8

Systems Design

# Post What it covers
00 Architecture Patterns: How Systems Are Structured Twenty patterns covering monoliths, microservices, events, resilience, deployment, and data processing. How to structure systems that survive growth.
01 Monolithic Architecture: The Default That Gets Abandoned Too Early Monoliths are fast to build and easy to operate. Learn when they're the right choice, when they break down, and how to know the difference.
02 Microservices: The Architecture You Earn, Not Choose Microservices enable independent scaling and team autonomy — but at significant cost. Learn what you actually get, what you pay, and when it's worth it.
03 Serverless: Pay for What You Use, Not What You Provision Serverless scales to zero and charges per invocation. Learn where it shines, where it fails, and how to design around cold starts and vendor lock-in.
04 Event-Driven Architecture: Decoupling Through Events Event-driven systems communicate via events rather than direct calls. Learn how producers, consumers, and event brokers work — and the consistency tradeoffs involved.
05 Message Queues: Decoupling Produce from Consume Message queues decouple producers and consumers, enable load levelling, and provide durability. Learn how they work and when to use Kafka vs SQS vs RabbitMQ.
06 Pub/Sub: Broadcasting Events to Multiple Consumers Pub/sub decouples publishers from subscribers through topics. Learn how it differs from message queues and when to use Kafka, SNS, or Google Pub/Sub.
07 CQRS: When Reads and Writes Need Different Models CQRS separates writes from reads so each can be optimised independently. Learn how it works, when it's worth the complexity, and when it isn't.
08 Event Sourcing: The Ledger, Not the Balance Event sourcing stores state as a sequence of events. Learn how it works, what you get (audit log, time travel), and what it costs (complexity, schema evolution).
09 The Saga Pattern: Distributed Transactions Without Locks The Saga pattern manages distributed transactions across services using compensating transactions. Learn choreography vs orchestration and when to use each.
10 The Outbox Pattern: Atomic Writes and Event Publishing ← you are here The Outbox pattern solves the dual-write problem — publishing an event and writing to a database atomically. Learn how it works using CDC or polling.
11 The Circuit Breaker: Stopping Cascading Failures Circuit breakers prevent cascading failures by fast-failing calls to unhealthy dependencies. Learn the three states, how to configure them, and where to apply them.
12 The Bulkhead Pattern: Containing Failures Through Resource Isolation Bulkheads isolate thread pools and connections per dependency so one failure can't exhaust resources needed by others. Learn how to apply them in practice.
13 The Sidecar Pattern: Cross-Cutting Concerns Without Code Changes The sidecar pattern deploys a helper process alongside each service for logging, metrics, TLS, and service discovery — without modifying the service itself.
14 Service Mesh: A Programmable Network for Microservices A service mesh handles service-to-service traffic, mTLS, circuit breaking, and observability via a fleet of sidecar proxies. Learn how it works and when to use it.
15 Service Discovery: Finding Services in a Dynamic Environment Service discovery lets services find each other in dynamic environments. Learn client-side vs server-side discovery, health checks, and DNS vs registry approaches.
16 The Strangler Fig: Replacing a Legacy System Without Burning It Down The Strangler Fig replaces a legacy system incrementally by routing specific functionality to new implementations while the old system keeps running.
17 Backend for Frontend: One API Per Client Type BFF creates dedicated API backends per client type. Learn why one general API struggles to serve mobile and web well, and how BFF solves it.
18 ETL Pipelines: Moving Data from Operations to Analytics ETL moves data from operational systems into analytical stores. Learn how pipelines work, what ELT is, and how to design reliable data movement at scale.
19 Batch vs Stream Processing: How Fresh Do Your Answers Need to Be? Batch processes accumulate data then processes in bulk; streaming processes each event as it arrives. Learn the tradeoffs and when each is right.
20 MapReduce: Processing Petabytes in Parallel MapReduce processes massive datasets in parallel by splitting work into map and reduce phases. Learn how it works and why Spark has largely replaced it.
21 Architecture Patterns: Wrap-Up A recap of all 20 architecture patterns across decomposition, async communication, data patterns, resilience, and data processing. How they connect.

The Outbox Pattern: Atomic Writes and Event Publishing

The problem

When a new link is created in your URL shortener, two things must happen:

  1. Insert the link into the PostgreSQL links table

  2. Publish a LinkCreated event to Kafka so downstream services can react

The naive implementation does both sequentially:

db.insert("links", link_data)          # (1) write to DB
kafka.publish("link.created", event)   # (2) publish to Kafka

This is a dual write — two separate systems updated independently without a distributed transaction. It has a critical failure mode:

You can interact with the diagrams bellow:

db.insert succeeds ✓
kafka.publish fails ✗  (Kafka is briefly unavailable)

Result:
  - Link exists in the database
  - Downstream services never receive the event
  - Analytics, QR generation, webhooks — all miss this link
  - System is silently inconsistent

The reverse failure is equally bad:

db.insert fails ✗
kafka.publish succeeds ✓

Result:
  - Event published for a link that doesn't exist in the database
  - Consumers try to look up a link that isn't there

You cannot atomically write to two different systems without a distributed transaction. And distributed transactions across a database and a message broker (2PC) are fragile and impractical.


The core idea

The Outbox pattern ensures that a database write and a message publish are effectively atomic by writing the message to a dedicated table (the "outbox") in the same database transaction as the main write. A separate process reads from the outbox and publishes to the message broker. The database transaction guarantees that either both the entity and the outbox message are committed, or neither is.


The analogy: leaving a note for the postal service

You need to both file a document and send a letter about it. You can't guarantee both happen simultaneously. But you can:

  1. File the document AND leave a note for the postal service — both in the same action

  2. The postal service checks for notes and sends them out when it's available

The note stays in your possession until the postal service confirms receipt. If the postal service is temporarily unavailable, the note waits — you don't lose the intent to send the letter just because the postal service is briefly down.


How it works

The outbox table

CREATE TABLE outbox (
  id          UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  event_type  TEXT NOT NULL,         -- "LinkCreated"
  payload     JSONB NOT NULL,        -- the event body
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  published_at TIMESTAMPTZ,          -- NULL until successfully published
  status      TEXT DEFAULT 'pending' -- pending | published | failed
);

The atomic write

with db.transaction():
    # Main write
    link = db.insert("links", link_data)

    # Outbox entry — in the same transaction
    db.insert("outbox", {
        "event_type": "LinkCreated",
        "payload": json.dumps({
            "id": link.id,
            "user_id": link.user_id,
            "destination": link.destination,
            "created_at": link.created_at.isoformat()
        })
    })
    # Both committed or both rolled back — no partial state

The outbox relay

A separate process polls the outbox and publishes pending messages:

def relay_loop():
    while True:
        pending = db.query("""
            SELECT * FROM outbox
            WHERE status = 'pending'
            ORDER BY created_at
            LIMIT 100
            FOR UPDATE SKIP LOCKED
        """)

        for entry in pending:
            try:
                kafka.publish(entry.event_type, entry.payload)
                db.update("outbox", entry.id, {"status": "published", "published_at": now()})
            except KafkaException:
                db.update("outbox", entry.id, {"status": "failed"})

        time.sleep(0.1)

FOR UPDATE SKIP LOCKED ensures that if multiple relay instances are running, they don't compete for the same rows — each picks up a distinct batch.

CDC-based relay (preferred at scale)

Polling the outbox adds database load and has inherent latency (the sleep interval). A more elegant approach uses Change Data Capture (CDC): a tool like Debezium watches the database's write-ahead log and streams new outbox rows to Kafka automatically.

PostgreSQL WAL
  → Debezium (CDC connector)
  → Kafka topic: outbox.events
  → Consumers downstream

No polling loop, no additional database queries. New outbox rows are picked up within milliseconds of commit. Debezium handles retries and guarantees at-least-once delivery.


Idempotency: handling duplicate publishes

The outbox relay uses at-least-once delivery. In failure scenarios (the relay crashes after publishing but before marking the entry published), the same event may be published more than once.

Consumers must be idempotent. Include a unique event ID in every outbox entry:

{
  "event_id": "01944b7c-9f3a-7e12-bb00-cd0e4a4e37b2",
  "event_type": "LinkCreated",
  "payload": { ... }
}

Consumers deduplicate by event_id — if an event with this ID has already been processed, skip it. A simple Redis SET or a deduplication table in PostgreSQL handles this.


Tradeoffs

At-least-once delivery. The outbox relay guarantees every event is eventually delivered — but may deliver it more than once. Consumers must be idempotent.

Relay latency. With polling, the relay introduces latency equal to the polling interval (100ms–1s typically). With CDC, latency is milliseconds. For most event-driven workloads, either is acceptable.

Database outbox table size. The outbox table grows with unpublished events. Prune published entries periodically (delete rows older than 7 days with status = 'published'). Monitor for growing backlogs of pending entries — it indicates a relay failure.

Operational overhead. You're maintaining either a polling service or a Debezium connector. Both require monitoring and management. The CDC approach requires the database's WAL to be accessible (available in all major cloud-managed databases).


When to use it / when not to

Use the Outbox pattern when:

  • Your service writes to a database and must publish events to a message broker

  • You've experienced or are at risk of dual-write inconsistency

  • Event delivery must be reliable (missing events cause downstream data loss)

Don't need it when:

  • Event delivery is best-effort (fire-and-forget telemetry where occasional loss is acceptable)

  • You're using a transactional outbox natively supported by your ORM or framework (Axon, Eventuate Tram)


The one thing to remember

The Outbox pattern solves the dual-write problem by making the message an outbox table row, written in the same database transaction as the entity it's about. The relay (polling or CDC) publishes the message once it's safely committed. Either both the entity write and the event delivery happen, or neither does — because both are coordinated through one transactional database, not two independent systems.


← Previous: Saga — managing multi-step distributed transactions where each step touches a different service, and failure of any step requires compensating the previous ones.

→ Next: Circuit Breaker — when a downstream service starts failing, the circuit breaker prevents a cascade by fast-failing calls rather than waiting for timeouts.

Systems Design

Part 1 of 50

Understanding these system design concepts is essential for architects, developers, and engineers to create scalable, reliable, and maintainable software systems that meet the needs of businesses.

More from this blog

Cloud Tuned

729 posts

Your starting point for anything cloud: AWS, Azure, GCP, Serverless, Architecture, Hybrid Cloud, Systems Design and other Information Technology topics.