Message Queues: Decoupling Produce from Consume

Series: System Design · Architecture Patterns — Pillar 7 of 8
Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | Architecture Patterns: How Systems Are Structured | Twenty patterns covering monoliths, microservices, events, resilience, deployment, and data processing. How to structure systems that survive growth. |
| 01 | Monolithic Architecture: The Default That Gets Abandoned Too Early | Monoliths are fast to build and easy to operate. Learn when they're the right choice, when they break down, and how to know the difference. |
| 02 | Microservices: The Architecture You Earn, Not Choose | Microservices enable independent scaling and team autonomy — but at significant cost. Learn what you actually get, what you pay, and when it's worth it. |
| 03 | Serverless: Pay for What You Use, Not What You Provision | Serverless scales to zero and charges per invocation. Learn where it shines, where it fails, and how to design around cold starts and vendor lock-in. |
| 04 | Event-Driven Architecture: Decoupling Through Events | Event-driven systems communicate via events rather than direct calls. Learn how producers, consumers, and event brokers work — and the consistency tradeoffs involved. |
| 05 | Message Queues: Decoupling Produce from Consume ← you are here | Message queues decouple producers and consumers, enable load levelling, and provide durability. Learn how they work and when to use Kafka vs SQS vs RabbitMQ. |
| 06 | Pub/Sub: Broadcasting Events to Multiple Consumers | Pub/sub decouples publishers from subscribers through topics. Learn how it differs from message queues and when to use Kafka, SNS, or Google Pub/Sub. |
| 07 | CQRS: When Reads and Writes Need Different Models | CQRS separates writes from reads so each can be optimised independently. Learn how it works, when it's worth the complexity, and when it isn't. |
| 08 | Event Sourcing: The Ledger, Not the Balance | Event sourcing stores state as a sequence of events. Learn how it works, what you get (audit log, time travel), and what it costs (complexity, schema evolution). |
| 09 | The Saga Pattern: Distributed Transactions Without Locks | The Saga pattern manages distributed transactions across services using compensating transactions. Learn choreography vs orchestration and when to use each. |
| 10 | The Outbox Pattern: Atomic Writes and Event Publishing | The Outbox pattern solves the dual-write problem — publishing an event and writing to a database atomically. Learn how it works using CDC or polling. |
| 11 | The Circuit Breaker: Stopping Cascading Failures | Circuit breakers prevent cascading failures by fast-failing calls to unhealthy dependencies. Learn the three states, how to configure them, and where to apply them. |
| 12 | The Bulkhead Pattern: Containing Failures Through Resource Isolation | Bulkheads isolate thread pools and connections per dependency so one failure can't exhaust resources needed by others. Learn how to apply them in practice. |
| 13 | The Sidecar Pattern: Cross-Cutting Concerns Without Code Changes | The sidecar pattern deploys a helper process alongside each service for logging, metrics, TLS, and service discovery — without modifying the service itself. |
| 14 | Service Mesh: A Programmable Network for Microservices | A service mesh handles service-to-service traffic, mTLS, circuit breaking, and observability via a fleet of sidecar proxies. Learn how it works and when to use it. |
| 15 | Service Discovery: Finding Services in a Dynamic Environment | Service discovery lets services find each other in dynamic environments. Learn client-side vs server-side discovery, health checks, and DNS vs registry approaches. |
| 16 | The Strangler Fig: Replacing a Legacy System Without Burning It Down | The Strangler Fig replaces a legacy system incrementally by routing specific functionality to new implementations while the old system keeps running. |
| 17 | Backend for Frontend: One API Per Client Type | BFF creates dedicated API backends per client type. Learn why one general API struggles to serve mobile and web well, and how BFF solves it. |
| 18 | ETL Pipelines: Moving Data from Operations to Analytics | ETL moves data from operational systems into analytical stores. Learn how pipelines work, what ELT is, and how to design reliable data movement at scale. |
| 19 | Batch vs Stream Processing: How Fresh Do Your Answers Need to Be? | Batch processes accumulate data then processes in bulk; streaming processes each event as it arrives. Learn the tradeoffs and when each is right. |
| 20 | MapReduce: Processing Petabytes in Parallel | MapReduce processes massive datasets in parallel by splitting work into map and reduce phases. Learn how it works and why Spark has largely replaced it. |
| 21 | Architecture Patterns: Wrap-Up | A recap of all 20 architecture patterns across decomposition, async communication, data patterns, resilience, and data processing. How they connect. |
Message Queues: Decoupling Produce from Consume
The problem
Your URL shortener sends email notifications: welcome emails, weekly analytics digests, link expiry warnings. The email sending code lives in the API server — when a user signs up, the API handler calls the email service inline.
During a marketing campaign, signups spike to five hundred per second. Your email provider rate-limits you to a hundred emails per second. The API handler blocks waiting for the email service. Request latency balloons. The signup flow starts timing out. Users can't sign up because the email system can't keep up.
Two separate problems got tangled:
- The user signup operation (fast, must succeed)
- The welcome email delivery (slow, can be delayed)
A message queue separates them. The signup handler writes to a queue (fast, always succeeds). An email worker reads from the queue at the rate the email provider allows. The queue absorbs the burst — signup latency stays low regardless of email throughput.
The core idea
A message queue is a durable, ordered buffer that sits between a producer (the component that generates work) and a consumer (the component that processes it). The producer enqueues messages at its own rate; the consumer dequeues and processes them at its own rate. The queue absorbs the difference.
The analogy: a postal service
A company drops outgoing mail in a collection box (the queue). The postal worker picks it up twice a day (the consumer) and delivers it at the postal service's pace. The company doesn't hand each letter directly to the recipient — it deposits the letter and moves on. The postal system guarantees delivery; timing is the postal service's responsibility.
If the company is busy and produces fifty letters in an hour, the collection box holds them until pickup. The postal worker never receives fifty simultaneous deliveries to process at once. The queue levels the load.
How message queues work
Core semantics
Enqueue: a producer sends a message to the queue. The queue durably stores it (typically to disk) and acknowledges receipt.
Dequeue: a consumer polls the queue or receives a push delivery. It processes the message and acknowledges completion.
Acknowledgement (ack): the consumer tells the queue "I've processed this message successfully." Only then does the queue remove it. If the consumer crashes before acking, the queue re-delivers the message to another consumer instance.
Visibility timeout: after delivering a message, the queue hides it from other consumers for a configurable period (the visibility timeout). If the consumer acks within the timeout, the message is deleted. If the consumer crashes (no ack), the timeout expires and the message becomes visible again — automatically retried.
Queue: [ msg1, msg2, msg3, msg4, msg5 ]
Consumer A polls:
Receives msg1 (hidden from others for 30s)
Processes msg1 (3s) → acks → msg1 deleted
Consumer A crashes while processing msg2:
msg2 hidden timeout expires (30s) → msg2 becomes visible again
Consumer B polls → receives msg2 → processes it
Point-to-point delivery
Message queues are point-to-point by default: each message is delivered to exactly one consumer. If you run three instances of the email worker, each message goes to one worker — not all three. This is load balancing across consumers.
This contrasts with pub/sub (post 06), where each message goes to every subscriber.
Dead letter queues (DLQ)
If a message fails processing repeatedly (a consumer crashes, a message is malformed, a downstream dependency is unavailable), the queue moves it to a dead letter queue after N failed attempts. The DLQ holds failed messages for manual inspection, alerting, or reprocessing — without blocking the main queue.
SQS Main Queue → Consumer fails 3x → SQS DLQ
→ Alert fires
→ Engineer inspects, fixes, replays
DLQs are essential for preventing poison messages (messages that can never be processed successfully) from blocking the entire queue indefinitely.
Kafka vs traditional queues
Traditional queues (SQS, RabbitMQ): messages are deleted after acknowledgement. The queue is a buffer — messages exist only until consumed. No concept of replaying old messages.
Kafka: messages are stored in a log for a configurable retention period (e.g., 7 days). Consumers track their own offset (position in the log). Multiple consumer groups can read the same log independently, each at its own pace. Reprocessing from the beginning is possible by resetting an offset.
Kafka log: [msg1, msg2, msg3, msg4, msg5, ...]
↑ ↑
Analytics consumer Email consumer
(offset=3) (offset=5, up to date)
New consumer group (audit logger):
Starts at offset=1, reads entire history
Doesn't affect other consumers
When Kafka is better: high throughput, multiple consumer groups needing the same event stream, event replay for recovery or new consumer bootstrap, event sourcing.
When SQS/RabbitMQ is better: simple task queues, guaranteed at-most-once or exactly-once delivery, lower operational overhead, point-to-point message delivery with no need for replay.
Delivery semantics
A critical design decision: what does the queue guarantee about delivery?
At-most-once: the message is delivered zero or one times. If the consumer crashes before processing, the message is lost. Fast but lossy. Appropriate for metrics, telemetry — losing an event occasionally is acceptable.
At-least-once: the message is delivered one or more times. If the consumer crashes before acking, the message is re-delivered. The consumer may process the same message twice. The consumer must be idempotent — processing the same message twice produces the same result. The most common production choice.
Exactly-once: the message is delivered exactly once, with no duplicates and no loss. Extremely hard to guarantee across distributed systems. Kafka 0.11+ supports exactly-once within Kafka. Across external systems (Kafka → database), it requires distributed transactions or idempotent writes.
Most production systems use at-least-once with idempotent consumers.
Tradeoffs
Durability vs latency. Queues that write to disk on every message have milliseconds of write latency but survive crashes. In-memory queues are faster but lose messages on restart. Choose based on message importance.
Fan-out needs pub/sub. A message queue sends each message to one consumer. If you need all of Analytics, Email, and Dashboard services to receive every click event, a queue doesn't work — you need pub/sub or Kafka with multiple consumer groups.
Ordering guarantees. SQS standard queues offer best-effort ordering (messages can arrive out of order). SQS FIFO queues guarantee order but have lower throughput. Kafka guarantees order within a partition. If order matters for your processing logic, choose accordingly.
Queue depth as a health metric. A growing queue depth means consumers can't keep up with producers. Monitor it. Alert on it. A persistent deep queue is either a consumer bug, insufficient consumer capacity, or a traffic spike that will resolve.
The one thing to remember
A message queue decouples the rate at which work is produced from the rate at which it's processed. Producers enqueue fast; consumers dequeue at whatever rate they can sustain. The queue buffers the difference and provides durability — if a consumer crashes, the message is not lost, just re-delivered. The most important design decision is delivery semantics: at-least-once with idempotent consumers is the practical choice for almost everything; exactly-once is a desirable fiction that's achievable only in narrow circumstances.
← Previous: Event-Driven Architecture — instead of services calling each other, they publish events and subscribe to events; a model that decouples producers from consumers at the cost of eventual consistency.
→ Next: Pub/Sub — message queues route to one consumer; pub/sub broadcasts to all subscribers. Here's how the fan-out model works and when you need it.




