Series: System Design · Data & Storage — Pillar 4 of 8

Systems Design

#	Post	What it covers
00	Data & Storage: Where Everything Lives	Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars)
01	SQL vs NoSQL: Choosing the Right Database	SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call.
02	Database Indexing: The Highest-Leverage Performance Tool	Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one.
03	B-Trees & B+ Trees: The Data Structure Behind Database Indexes	Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries.
04	LSM Trees: Why Some Databases Are Built for Writes	LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it.
05	Denormalisation: Trading Storage for Speed	Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely.
06	Database Sharding: Scaling Beyond a Single Node	Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces.
07	Data Partitioning: Choosing How to Divide Your Data	Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale.
08	Consistent Hashing: Minimising Resharding Pain	Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems.
09	Replication & Read Replicas: Scaling Reads and Surviving Failures	Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them.
10	Object Storage: Unlimited Scale for Large Binary Data	Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it.
11	Block vs File vs Object Storage: Three Models, Three Use Cases	Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them.
12	Distributed File Systems: File Storage Across Many Machines	Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them.
13	Time Series Databases: Built for Metrics and Events	Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB.
14	Vector Databases: Semantic Search and AI Memory	Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one.
15	Full-Text Search Engines: Beyond SQL LIKE	Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful.
16	Materialized Views: Pre-Computing Expensive Queries	Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches.
17	Query Optimisation: From Slow to Fast	Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast.
18	Connection Pooling: Managing the Hidden Bottleneck ← you are here	Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly.
19	Data & Storage: Wrap-Up	A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system.

Connection Pooling: Managing the Hidden Bottleneck

The problem

Your URL shortener scales to 50 application servers to handle peak redirect traffic. Each server runs 20 worker threads. Each thread opens a database connection when processing a request.

50 servers × 20 threads = 1,000 concurrent database connections.

PostgreSQL's default max_connections is 100. You increase it to 1,000. PostgreSQL allocates ~5–10MB of memory per connection — 1,000 connections consumes 5–10GB just for connection state, before storing any actual data. More importantly, PostgreSQL uses one OS process per connection. The kernel scheduler is now managing 1,000 processes, most of which are idle most of the time, waiting for requests. PostgreSQL's performance degrades — not because queries are slow, but because the overhead of managing 1,000 processes drowns the actual query work.

At peak, your redirect endpoint latency is 200ms. Database connections account for 150ms of it — not query execution, but the overhead of each request establishing, authenticating, and tearing down a connection.

This is the connection scaling problem. Connection pooling solves it.

The core idea

A connection pool maintains a fixed set of pre-established database connections. Application threads borrow a connection from the pool, execute their query, and return the connection. The pool reuses connections across requests — eliminating per-request connection setup overhead and capping the total number of connections to the database at a manageable number regardless of application server count.

The analogy: a taxi rank vs a car dealership

Without connection pooling, every query requires opening a new database connection:

Equivalent to buying a new car every time you need to make a trip
Expensive (authentication, process creation, memory allocation)
Wasteful (car is scrapped after one trip)
Doesn't scale (a city can't afford 10,000 new cars per hour)

With connection pooling, connections are shared and reused:

Equivalent to a taxi rank
The rank has 20 taxis (the pool size)
You need a taxi, you wait if all 20 are busy (queue), otherwise you get one immediately
After your trip, the taxi returns to the rank for the next passenger
The city only ever needs 20 taxis, even if 1,000 people want rides per hour

The pooler is the taxi rank manager.

How it works

What happens without a pool (per-request connections)

Request arrives
  → TCP connection to PostgreSQL (3-way handshake: ~1ms)
  → TLS handshake (~2ms)
  → PostgreSQL authentication (username/password/pg_hba check: ~3ms)
  → Session initialisation (SET search_path, client encoding, etc.: ~2ms)
  → Execute query (~5ms)
  → Close connection
Total: ~13ms, of which 8ms is connection overhead for a 5ms query

At high request rates, this overhead compounds. 1,000 requests/second × 8ms connection overhead = 8 seconds of CPU time per second wasted on connection management alone.

Connection pool mechanics

Application workers (many)
       ↓  borrow/return
  [ Connection Pool ]    ← pool manages N persistent connections
       ↓  1 persistent connection
  PostgreSQL server

Connection pool initialisation:

Pool creates N connections to PostgreSQL at startup
All connections stay open (persistent)
Connections are maintained with periodic keep-alive pings

Connection lifecycle (single request):

Worker requests a connection from the pool
If a connection is free: borrow it immediately (no TCP/auth overhead)
Execute query
Return connection to the pool (connection stays open, not closed)
Next request borrows the same connection

Under high load:

All N connections are in use
New requests queue and wait (configurable timeout)
If the wait exceeds the timeout, return an error ("connection pool exhausted")

Pool sizing

The right pool size is not "as large as possible." More connections than PostgreSQL can efficiently serve causes performance degradation.

PostgreSQL's effective concurrency limit is approximately equal to the number of CPU cores on the database server. A 16-core database server handles roughly 16–32 simultaneous queries efficiently; beyond that, query throughput doesn't increase (it may decrease as the kernel scheduler overhead grows).

Empirical guidance (from PgBouncer documentation and the HikariCP connection pool guide):

pool_size = (num_db_cores × 2) + num_effective_spindle_count

For an RDS db.r5.4xlarge (16 vCPUs, SSD): pool size ≈ 32–48.

This seems surprisingly small. The intuition: if the database has 16 cores and 100 connections all try to run queries simultaneously, only 16 can actually execute. The other 84 are queued. A larger pool doesn't make queries run faster — it just means more workers queue at the database rather than at the pool. Queueing at the pool is better (it's faster and doesn't consume database resources).

PgBouncer: the standard PostgreSQL connection pooler

PgBouncer is a lightweight proxy that sits between your application and PostgreSQL, managing a connection pool.

Application servers (many) → PgBouncer (one pool per database) → PostgreSQL

PgBouncer offers three pooling modes:

Session pooling: one server connection per client session. The client holds the connection for its entire duration. Similar to no pooling but adds a fixed connection endpoint. Lowest benefit; highest compatibility.

Transaction pooling: the client holds the connection only for the duration of a transaction (or a single statement if autocommit). The most common and effective mode. One server connection serves many client sessions as long as they're not all in a transaction simultaneously.

Statement pooling: the connection is released between individual statements within a multi-statement transaction. Rarely used because it breaks prepared statements and multi-statement transactions.

Transaction pooling example:

100 application workers, each potentially executing queries
Pool size: 20 server connections to PostgreSQL
Each worker holds a connection only for the duration of its query (~5ms)
20 connections serve 100+ workers because workers are idle most of the time

At 50ms average request handling time with 5ms of database time, a single connection can serve ~10 requests per second. 20 connections serve ~200 requests/second — enough for most production workloads.

Application-side connection pools

Many frameworks and ORMs include built-in connection pools:

SQLAlchemy (Python): configurable pool with create_engine(pool_size=20, max_overflow=10)
HikariCP (Java): high-performance connection pool; default in Spring Boot
node-postgres (Node.js): Pool class with configurable size
GORM (Go): SetMaxOpenConns(), SetMaxIdleConns()

For most applications, an application-side pool (no PgBouncer) is sufficient. PgBouncer is needed when:

Many application servers share one PostgreSQL instance (connection count would exceed safe limits without a centralised pooler)
You're on serverless or short-lived containers that can't maintain persistent connections

Pooling challenges

Prepared statements in transaction mode. PostgreSQL prepared statements are per-session. In PgBouncer transaction mode, a client's next query may run on a different server connection — where the prepared statement doesn't exist. Fix: use DISCARD ALL at transaction end (PgBouncer does this automatically with server_reset_query), or use protocol-level prepared statements carefully.

Session-level state. SET commands, temporary tables, and advisory locks are session-scoped. In transaction pooling mode, these don't persist between transactions. Applications that rely on session state won't work correctly in transaction pooling mode.

Connection starvation under long-running queries. If some connections are tied up by long-running analytics queries, short redirect lookups may queue waiting for a connection. Separate pools for OLTP (transactional) and analytics workloads.

Monitoring pool health. PgBouncer's SHOW POOLS command shows active clients, waiting clients, and server connections in use. Pool exhaustion (all connections in use, client queue growing) is an early warning of a scaling problem.

SHOW POOLS;
 database | user       | cl_active | cl_waiting | sv_active | sv_idle | maxwait
----------+------------+-----------+------------+-----------+---------+--------
 shorturl | app_user   | 18        | 5          | 20        | 0       | 0.12

cl_waiting=5 means 5 client requests are queued. maxwait=0.12 means they've waited up to 120ms. Alert if either grows.

The tradeoffs

Pool too small: connection wait times increase. Requests queue at the pool. Latency tail grows. At extreme shortage, requests time out.

Pool too large: PostgreSQL is overwhelmed with concurrent connections. Query throughput drops. Memory consumed by connection state increases. Kernel scheduler overhead grows.

Connection overhead eliminated: per-request TCP handshake and authentication (~8ms) disappears. For high-throughput services, this is often the most impactful latency optimisation after indexes.

Application complexity: session state, prepared statements, and long-running transactions require care in transaction pooling mode. Most OLTP applications work correctly with minimal changes.

The one thing to remember

Database connections are expensive to establish and consume server resources when idle. A connection pool keeps a fixed number of connections open and shares them across many application workers. Pool size should be tuned to the database server's CPU count — not the number of application workers. Counterintuitively, a pool of 20 connections usually outperforms a pool of 200: more connections than the database can efficiently serve creates scheduling overhead, not additional throughput.

← Previous: Query Optimisation: From Slow to Fast — Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and sys...

→ Next: Data & Storage: Wrap-Up — A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how...

Connection Pooling: Managing the Hidden Bottleneck

Systems Design

Connection Pooling: Managing the Hidden Bottleneck

The problem

The core idea

The analogy: a taxi rank vs a car dealership

How it works

What happens without a pool (per-request connections)

Connection pool mechanics

Pool sizing

PgBouncer: the standard PostgreSQL connection pooler

Application-side connection pools

Pooling challenges

The tradeoffs

The one thing to remember

Comments

Systems Design

More from this blog

Architecture Patterns: Wrap-Up

MapReduce: Processing Petabytes in Parallel

Batch vs Stream Processing: How Fresh Do Your Answers Need to Be?

ETL Pipelines: Moving Data from Operations to Analytics

Backend for Frontend: One API Per Client Type

Command Palette

Systems Design

Connection Pooling: Managing the Hidden Bottleneck

The problem

The core idea

The analogy: a taxi rank vs a car dealership

How it works

What happens without a pool (per-request connections)

Connection pool mechanics

Pool sizing

PgBouncer: the standard PostgreSQL connection pooler

Application-side connection pools

Pooling challenges

The tradeoffs

The one thing to remember

Comments

Systems Design

More from this blog