Skip to main content

Command Palette

Search for a command to run...

Connection Pooling: Managing the Hidden Bottleneck

Updated
12 min read
Connection Pooling: Managing the Hidden Bottleneck

Series: System Design · Data & Storage — Pillar 4 of 8

Systems Design

# Post What it covers
00 Data & Storage: Where Everything Lives Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars)
01 SQL vs NoSQL: Choosing the Right Database SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call.
02 Database Indexing: The Highest-Leverage Performance Tool Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one.
03 B-Trees & B+ Trees: The Data Structure Behind Database Indexes Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries.
04 LSM Trees: Why Some Databases Are Built for Writes LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it.
05 Denormalisation: Trading Storage for Speed Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely.
06 Database Sharding: Scaling Beyond a Single Node Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces.
07 Data Partitioning: Choosing How to Divide Your Data Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale.
08 Consistent Hashing: Minimising Resharding Pain Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems.
09 Replication & Read Replicas: Scaling Reads and Surviving Failures Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them.
10 Object Storage: Unlimited Scale for Large Binary Data Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it.
11 Block vs File vs Object Storage: Three Models, Three Use Cases Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them.
12 Distributed File Systems: File Storage Across Many Machines Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them.
13 Time Series Databases: Built for Metrics and Events Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB.
14 Vector Databases: Semantic Search and AI Memory Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one.
15 Full-Text Search Engines: Beyond SQL LIKE Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful.
16 Materialized Views: Pre-Computing Expensive Queries Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches.
17 Query Optimisation: From Slow to Fast Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast.
18 Connection Pooling: Managing the Hidden Bottleneck ← you are here Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly.
19 Data & Storage: Wrap-Up A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system.

Connection Pooling: Managing the Hidden Bottleneck

The problem

Your URL shortener scales to 50 application servers to handle peak redirect traffic. Each server runs 20 worker threads. Each thread opens a database connection when processing a request.

50 servers × 20 threads = 1,000 concurrent database connections.

PostgreSQL's default max_connections is 100. You increase it to 1,000. PostgreSQL allocates ~5–10MB of memory per connection — 1,000 connections consumes 5–10GB just for connection state, before storing any actual data. More importantly, PostgreSQL uses one OS process per connection. The kernel scheduler is now managing 1,000 processes, most of which are idle most of the time, waiting for requests. PostgreSQL's performance degrades — not because queries are slow, but because the overhead of managing 1,000 processes drowns the actual query work.

At peak, your redirect endpoint latency is 200ms. Database connections account for 150ms of it — not query execution, but the overhead of each request establishing, authenticating, and tearing down a connection.

This is the connection scaling problem. Connection pooling solves it.


The core idea

A connection pool maintains a fixed set of pre-established database connections. Application threads borrow a connection from the pool, execute their query, and return the connection. The pool reuses connections across requests — eliminating per-request connection setup overhead and capping the total number of connections to the database at a manageable number regardless of application server count.


The analogy: a taxi rank vs a car dealership

Without connection pooling, every query requires opening a new database connection:

  • Equivalent to buying a new car every time you need to make a trip
  • Expensive (authentication, process creation, memory allocation)
  • Wasteful (car is scrapped after one trip)
  • Doesn't scale (a city can't afford 10,000 new cars per hour)

With connection pooling, connections are shared and reused:

  • Equivalent to a taxi rank
  • The rank has 20 taxis (the pool size)
  • You need a taxi, you wait if all 20 are busy (queue), otherwise you get one immediately
  • After your trip, the taxi returns to the rank for the next passenger
  • The city only ever needs 20 taxis, even if 1,000 people want rides per hour

The pooler is the taxi rank manager.


How it works

What happens without a pool (per-request connections)

Request arrives
  → TCP connection to PostgreSQL (3-way handshake: ~1ms)
  → TLS handshake (~2ms)
  → PostgreSQL authentication (username/password/pg_hba check: ~3ms)
  → Session initialisation (SET search_path, client encoding, etc.: ~2ms)
  → Execute query (~5ms)
  → Close connection
Total: ~13ms, of which 8ms is connection overhead for a 5ms query

At high request rates, this overhead compounds. 1,000 requests/second × 8ms connection overhead = 8 seconds of CPU time per second wasted on connection management alone.

Connection pool mechanics

Application workers (many)
       ↓  borrow/return
  [ Connection Pool ]    ← pool manages N persistent connections
       ↓  1 persistent connection
  PostgreSQL server

Connection pool initialisation:

  1. Pool creates N connections to PostgreSQL at startup
  2. All connections stay open (persistent)
  3. Connections are maintained with periodic keep-alive pings

Connection lifecycle (single request):

  1. Worker requests a connection from the pool
  2. If a connection is free: borrow it immediately (no TCP/auth overhead)
  3. Execute query
  4. Return connection to the pool (connection stays open, not closed)
  5. Next request borrows the same connection

Under high load:

  • All N connections are in use
  • New requests queue and wait (configurable timeout)
  • If the wait exceeds the timeout, return an error ("connection pool exhausted")

Pool sizing

The right pool size is not "as large as possible." More connections than PostgreSQL can efficiently serve causes performance degradation.

PostgreSQL's effective concurrency limit is approximately equal to the number of CPU cores on the database server. A 16-core database server handles roughly 16–32 simultaneous queries efficiently; beyond that, query throughput doesn't increase (it may decrease as the kernel scheduler overhead grows).

Empirical guidance (from PgBouncer documentation and the HikariCP connection pool guide):

pool_size = (num_db_cores × 2) + num_effective_spindle_count

For an RDS db.r5.4xlarge (16 vCPUs, SSD): pool size ≈ 32–48.

This seems surprisingly small. The intuition: if the database has 16 cores and 100 connections all try to run queries simultaneously, only 16 can actually execute. The other 84 are queued. A larger pool doesn't make queries run faster — it just means more workers queue at the database rather than at the pool. Queueing at the pool is better (it's faster and doesn't consume database resources).

PgBouncer: the standard PostgreSQL connection pooler

PgBouncer is a lightweight proxy that sits between your application and PostgreSQL, managing a connection pool.

Application servers (many) → PgBouncer (one pool per database) → PostgreSQL

PgBouncer offers three pooling modes:

Session pooling: one server connection per client session. The client holds the connection for its entire duration. Similar to no pooling but adds a fixed connection endpoint. Lowest benefit; highest compatibility.

Transaction pooling: the client holds the connection only for the duration of a transaction (or a single statement if autocommit). The most common and effective mode. One server connection serves many client sessions as long as they're not all in a transaction simultaneously.

Statement pooling: the connection is released between individual statements within a multi-statement transaction. Rarely used because it breaks prepared statements and multi-statement transactions.

Transaction pooling example:

  • 100 application workers, each potentially executing queries
  • Pool size: 20 server connections to PostgreSQL
  • Each worker holds a connection only for the duration of its query (~5ms)
  • 20 connections serve 100+ workers because workers are idle most of the time

At 50ms average request handling time with 5ms of database time, a single connection can serve ~10 requests per second. 20 connections serve ~200 requests/second — enough for most production workloads.

Application-side connection pools

Many frameworks and ORMs include built-in connection pools:

  • SQLAlchemy (Python): configurable pool with create_engine(pool_size=20, max_overflow=10)
  • HikariCP (Java): high-performance connection pool; default in Spring Boot
  • node-postgres (Node.js): Pool class with configurable size
  • GORM (Go): SetMaxOpenConns(), SetMaxIdleConns()

For most applications, an application-side pool (no PgBouncer) is sufficient. PgBouncer is needed when:

  • Many application servers share one PostgreSQL instance (connection count would exceed safe limits without a centralised pooler)
  • You're on serverless or short-lived containers that can't maintain persistent connections

Pooling challenges

Prepared statements in transaction mode. PostgreSQL prepared statements are per-session. In PgBouncer transaction mode, a client's next query may run on a different server connection — where the prepared statement doesn't exist. Fix: use DISCARD ALL at transaction end (PgBouncer does this automatically with server_reset_query), or use protocol-level prepared statements carefully.

Session-level state. SET commands, temporary tables, and advisory locks are session-scoped. In transaction pooling mode, these don't persist between transactions. Applications that rely on session state won't work correctly in transaction pooling mode.

Connection starvation under long-running queries. If some connections are tied up by long-running analytics queries, short redirect lookups may queue waiting for a connection. Separate pools for OLTP (transactional) and analytics workloads.

Monitoring pool health. PgBouncer's SHOW POOLS command shows active clients, waiting clients, and server connections in use. Pool exhaustion (all connections in use, client queue growing) is an early warning of a scaling problem.

SHOW POOLS;
 database | user       | cl_active | cl_waiting | sv_active | sv_idle | maxwait
----------+------------+-----------+------------+-----------+---------+--------
 shorturl | app_user   | 18        | 5          | 20        | 0       | 0.12

cl_waiting=5 means 5 client requests are queued. maxwait=0.12 means they've waited up to 120ms. Alert if either grows.


The tradeoffs

Pool too small: connection wait times increase. Requests queue at the pool. Latency tail grows. At extreme shortage, requests time out.

Pool too large: PostgreSQL is overwhelmed with concurrent connections. Query throughput drops. Memory consumed by connection state increases. Kernel scheduler overhead grows.

Connection overhead eliminated: per-request TCP handshake and authentication (~8ms) disappears. For high-throughput services, this is often the most impactful latency optimisation after indexes.

Application complexity: session state, prepared statements, and long-running transactions require care in transaction pooling mode. Most OLTP applications work correctly with minimal changes.


The one thing to remember

Database connections are expensive to establish and consume server resources when idle. A connection pool keeps a fixed number of connections open and shares them across many application workers. Pool size should be tuned to the database server's CPU count — not the number of application workers. Counterintuitively, a pool of 20 connections usually outperforms a pool of 200: more connections than the database can efficiently serve creates scheduling overhead, not additional throughput.


← Previous: Query Optimisation: From Slow to Fast — Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and sys...

→ Next: Data & Storage: Wrap-Up — A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how...

Systems Design

Part 1 of 50

Understanding these system design concepts is essential for architects, developers, and engineers to create scalable, reliable, and maintainable software systems that meet the needs of businesses.

More from this blog

Cloud Tuned

729 posts

Your starting point for anything cloud: AWS, Azure, GCP, Serverless, Architecture, Hybrid Cloud, Systems Design and other Information Technology topics.