Data & Storage: Where Everything Lives

Series: System Design · Data & Storage — Pillar 4 of 8
Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | Data & Storage: Where Everything Lives ← you are here | Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars) |
| 01 | SQL vs NoSQL: Choosing the Right Database | SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call. |
| 02 | Database Indexing: The Highest-Leverage Performance Tool | Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one. |
| 03 | B-Trees & B+ Trees: The Data Structure Behind Database Indexes | Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries. |
| 04 | LSM Trees: Why Some Databases Are Built for Writes | LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it. |
| 05 | Denormalisation: Trading Storage for Speed | Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely. |
| 06 | Database Sharding: Scaling Beyond a Single Node | Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces. |
| 07 | Data Partitioning: Choosing How to Divide Your Data | Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale. |
| 08 | Consistent Hashing: Minimising Resharding Pain | Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems. |
| 09 | Replication & Read Replicas: Scaling Reads and Surviving Failures | Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them. |
| 10 | Object Storage: Unlimited Scale for Large Binary Data | Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it. |
| 11 | Block vs File vs Object Storage: Three Models, Three Use Cases | Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them. |
| 12 | Distributed File Systems: File Storage Across Many Machines | Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them. |
| 13 | Time Series Databases: Built for Metrics and Events | Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB. |
| 14 | Vector Databases: Semantic Search and AI Memory | Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one. |
| 15 | Full-Text Search Engines: Beyond SQL LIKE | Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful. |
| 16 | Materialized Views: Pre-Computing Expensive Queries | Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches. |
| 17 | Query Optimisation: From Slow to Fast | Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast. |
| 18 | Connection Pooling: Managing the Hidden Bottleneck | Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly. |
| 19 | Data & Storage: Wrap-Up | A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system. |
Data & Storage: Where Everything Lives
The scenario
Your URL shortener is now a serious platform. Hundreds of millions of short links have been created. Billions of redirect events are recorded every month. The analytics dashboard needs sub-second query response across months of click data. The search feature needs full-text search across link titles and destinations. The recommendation engine needs to find semantically similar content. The ops team is hitting connection limits on the database at peak load.
One database — a single PostgreSQL instance — served you well until now. But a single general-purpose relational database is not the right answer to all of these needs simultaneously. The analytics workload (write-heavy, time-structured, aggregate queries over billions of rows) wants a time series database. The search feature wants a full-text search engine. The recommendation engine wants a vector database. The core link data wants a relational database with careful indexing.
This is the central insight of the Data & Storage pillar: different data has different access patterns, and different access patterns warrant different storage engines. The engineer who knows only relational databases will force every problem into a relational mould. The engineer who understands the full landscape can match the store to the need.
Nineteen concepts across five themes:
- Choosing the right database — SQL vs NoSQL, and the specialised stores (time series, vector, full-text search) built for specific workloads
- Making reads fast — indexing, query optimisation, materialized views, denormalisation
- Scaling data horizontally — sharding, partitioning, consistent hashing
- Surviving failures — replication, read replicas
- Everything else — storage types, connection pooling, the data structures that make databases work
TL;DR: SQL databases are the right default for structured relational data with ACID guarantees. NoSQL trades some of those guarantees for scale, flexibility, or access pattern fit. Indexes make reads fast at the cost of write overhead. Sharding and partitioning distribute data across nodes. Replication keeps data safe and scales reads. Specialised databases (time series, vector, search) outperform general-purpose databases for their target workloads by orders of magnitude. And underneath all of it, B-trees, LSM trees, and bloom filters are the data structures that make the whole stack work.
What this pillar covers
SQL vs NoSQL — the foundational database choice
The choice between relational (SQL) and non-relational (NoSQL) databases is one of the most consequential — and most cargo-culted — decisions in system design. SQL databases offer schemas, joins, transactions, and decades of operational maturity. NoSQL databases offer flexible schemas, horizontal scalability, and data models better suited to specific access patterns.
The key insight: NoSQL is not a single thing. Document stores, key-value stores, wide-column stores, and graph databases each have distinct data models and performance characteristics. "Use NoSQL for scale" is not engineering — it's fashion. Know what each type optimises for.
Best mental model: SQL is a spreadsheet with enforced relationships — every row follows the same schema, relationships are explicit, and you can query across any column. NoSQL is a filing cabinet optimised for how you retrieve things — fast if you access data the way it was organised, slow if you don't.
Indexing — the most impactful performance tool
An index is a data structure that allows the database to find rows matching a query without scanning every row in the table. A table with a million rows and no index on the query column scans all million rows. The same table with a proper index finds matching rows in logarithmic time.
Indexes are the highest-leverage performance tool available to most engineers — a single index addition can turn a query that takes 10 seconds into one that takes 10 milliseconds. They also have costs: write overhead, storage, and maintenance. Understanding when to add an index, which type to add, and when indexes hurt rather than help is fundamental database knowledge.
Best mental model: an index is the index at the back of a textbook. Without it, finding all references to "consistent hashing" means reading every page. With it, you go directly to the right pages. Adding a new entry to the index takes extra work when the book is published — but the lookup savings over the book's lifetime are enormous.
B-trees & B+ trees — how most indexes work
Almost every relational database index is built on a B-tree or B+ tree. Understanding the data structure behind the index explains why indexes work the way they do: why range queries on indexed columns are fast, why the leftmost column rule exists for composite indexes, and why indexes have write overhead.
Best mental model: a B-tree is a sorted, balanced, multi-level filing system. Each level narrows the search — the top level divides the entire keyspace into large ranges, each subsequent level narrows further until you reach the actual data. Finding a specific key requires traversing from top to bottom — a small number of comparisons regardless of the total data size.
LSM Trees — how write-optimised databases work
B-trees optimise for reads. LSM (Log-Structured Merge) trees optimise for writes — used by Cassandra, RocksDB, LevelDB, and many modern NoSQL systems. Understanding LSM trees explains why certain databases handle high write throughput so effectively, and what the compaction process is doing in the background.
Best mental model: an LSM tree is like taking notes in a journal rather than updating a filing cabinet. Every new entry goes at the end of the journal (fast writes). Periodically, the journal is sorted and merged into the filing cabinet (compaction). Reading requires checking the journal and the cabinet — slightly slower than a pure filing cabinet, but writes are dramatically faster.
Denormalisation — trading storage for read performance
Database normalisation eliminates redundancy and ensures data consistency. Denormalisation deliberately reintroduces redundancy to make reads faster — storing pre-computed joins, aggregates, or duplicated fields so that common queries don't require expensive computation at read time.
Best mental model: normalisation is keeping the master copy of each piece of information in one place. Denormalisation is making photocopies and distributing them — reads are instant because the data is right there, but updates require updating every copy, and copies can drift out of sync if updates aren't managed carefully.
Sharding — distributing data horizontally
Sharding splits a database's data across multiple physical nodes, with each node (shard) holding a subset of the data. When a single database node can no longer hold all the data or handle all the write load, sharding is how you grow horizontally.
Best mental model: a library that's run out of shelf space. Rather than one impossibly large building, you open multiple branches — each holding books by a different category. Finding a book requires knowing which branch has that category. The total collection is distributed; each branch is manageable.
Data Partitioning — the mechanics of dividing data
Partitioning is the general concept of dividing data into subsets. Sharding is one form (distributing across separate nodes). Understanding range partitioning, hash partitioning, and list partitioning — and the tradeoffs each makes — is what lets you choose the right division strategy for a specific dataset and access pattern.
Best mental model: partitioning is deciding which drawer of a filing cabinet to put a document in. Range partitioning: drawer 1 for A–F, drawer 2 for G–M, etc. Hash partitioning: hash the document name, put it in the drawer the hash points to. The strategy determines how easy it is to find documents and how evenly the drawers fill up.
Consistent Hashing — partitioning for dynamic clusters
Standard hash partitioning breaks when you add or remove nodes — the hash space changes and nearly every key maps to a different node, requiring a massive redistribution. Consistent hashing solves this: adding or removing a node requires moving only a small fraction of keys.
Best mental model: consistent hashing arranges nodes on a ring. Each key is assigned to the first node clockwise from its position on the ring. Adding a node only redistributes the keys between it and its predecessor. A library adding a new branch only needs to transfer books from the nearest existing branch — not reorganise the entire collection.
Read Replicas & Replication — scaling reads and surviving failures
Database replication copies data from a primary node to one or more replicas. Read replicas serve read queries, offloading the primary. Replicas in separate availability zones or regions provide resilience — if the primary fails, a replica can be promoted.
Best mental model: replication is like making multiple copies of a library's collection. The primary library accepts new books and updates. The branch libraries (replicas) carry identical collections and handle most reading requests. If the primary library burns down, a branch can become the new primary.
Object, Block & File Storage — three storage models
Not all storage is a database. Large binary objects (images, videos, backups, logs) belong in object storage. Operating system disks and database volumes belong in block storage. Shared file systems belong in file storage. Using the wrong storage model for a workload is like using a filing cabinet to store pallets of goods — technically possible, deeply inefficient.
Best mental model: block storage is a blank hard drive — low-level, fast, structured however the OS wants. File storage is a shared network drive — familiar hierarchy, accessible from multiple machines. Object storage is a postal warehouse — you give it a package (object), it gives you a tracking number (key), you retrieve the package by number. No hierarchy, infinite scale, retrieval by key only.
Distributed File Systems — file storage at scale
When a single file server isn't enough, distributed file systems (HDFS, GlusterFS, Ceph) spread files across many machines, providing a unified namespace while storing data redundantly across nodes.
Best mental model: a distributed file system is a library where books are secretly stored in multiple warehouses — you check out a book from the front desk as normal, but the actual retrieval is coordinated across multiple locations you never see.
Specialised Databases — right tool, right workload
Three specialised stores are covered in depth:
Time series databases (InfluxDB, TimescaleDB, Prometheus) are optimised for append-heavy, time-ordered data with frequent range queries and aggregation over time windows. Storing metrics in a relational database is technically possible and operationally painful at scale.
Vector databases (Pinecone, Weaviate, pgvector) store high-dimensional vectors and support similarity search — finding the most similar items to a query vector. The foundation of semantic search, recommendation engines, and LLM-powered features.
Full-text search engines (Elasticsearch, OpenSearch, Typesense) index text with inverted indexes, supporting relevance-ranked search, fuzzy matching, and faceting — capabilities that SQL LIKE queries cannot match at scale.
Materialized Views — pre-computed query results
A materialized view stores the result of a query as a physical table, updated periodically or on demand. Complex aggregations that take seconds to compute are pre-computed and served in milliseconds.
Best mental model: a materialized view is a summary report that's been printed and filed. Reading the summary is instant. Regenerating it requires rerunning the analysis. The tradeoff is freshness vs read speed.
Query Optimisation — making existing queries faster
Understanding how a database's query planner works — how it chooses between index scans and table scans, how join order affects performance, how statistics influence planning — is what separates engineers who add indexes and hope from engineers who understand why a query is slow and how to make it fast.
Connection Pooling — managing database connections efficiently
Opening a new database connection is expensive — authentication, session setup, memory allocation. Connection pooling maintains a pool of open connections that queries reuse, dramatically reducing per-query overhead.
Best mental model: a connection pool is a taxi rank rather than a car dealership. You need a car (connection) for a short trip (query). Buying a new car for each trip (opening a new connection) and scrapping it after is wasteful. The taxi rank has cars ready and waiting — you use one, it returns to the queue when you're done.
The decision framework
What is the data's structure?
Relational, with joins and transactions?
└─ SQL (PostgreSQL, MySQL)
Document-oriented (flexible schema, nested objects)?
└─ Document store (MongoDB, DynamoDB)
Simple key → value lookups, extreme speed?
└─ Key-value store (Redis, DynamoDB)
High write throughput, wide columns, eventual consistency OK?
└─ Wide-column store (Cassandra, HBase)
Highly connected data, graph traversal queries?
└─ Graph database (Neo4j, Neptune)
Time-ordered metrics, frequent range queries and aggregations?
└─ Time series database (InfluxDB, TimescaleDB)
High-dimensional similarity search?
└─ Vector database (Pinecone, Weaviate, pgvector)
Full-text search with relevance ranking?
└─ Search engine (Elasticsearch, Typesense)
Large binary objects (images, video, backups)?
└─ Object storage (S3, GCS)
Is a single node insufficient?
Write load exceeds one node?
└─ Shard (horizontal partitioning)
Read load exceeds one node?
└─ Read replicas
Data exceeds one node's capacity?
└─ Shard or distribute (consistent hashing)
Common traps at this stage
Defaulting to PostgreSQL for everything. PostgreSQL is excellent — use it as your default for relational data. But forcing time series data into PostgreSQL when InfluxDB would serve it orders of magnitude better, or doing full-text search with LIKE '%query%' when Elasticsearch would give you relevance ranking, facets, and 100x better performance, is a failure of engineering judgment.
Sharding prematurely. Sharding adds enormous operational complexity. A single PostgreSQL instance can handle hundreds of millions of rows with proper indexing. Read replicas handle most read-scale problems before sharding is necessary. Shard when you genuinely need to, not because the data "feels big."
Under-indexing on write-heavy tables. The write overhead of an index is real but usually worth it. The read performance without an index on a frequently queried column is almost always worse than the write overhead of maintaining the index.
Over-indexing on write-heavy tables. Every index on a table is updated on every write. A table with 15 indexes on it will have dramatically degraded write performance. Add indexes for your actual query patterns, not defensively for every column.
Treating connection count as free. Database connections are expensive resources. An application that opens a new connection per request will exhaust the database's connection limit and fall over. Connection pooling is non-optional for any production application under real load.
Key takeaways
- SQL is the right default for structured relational data; NoSQL is the right choice for specific access patterns, scale requirements, or data models that don't fit the relational model
- Indexes are the highest-leverage read performance tool; they have write costs that must be managed
- B-trees power most relational indexes; LSM trees power most write-optimised NoSQL stores
- Denormalisation trades storage and write complexity for read performance
- Sharding scales writes and storage horizontally; requires careful key design
- Consistent hashing makes sharding resilient to cluster topology changes
- Read replicas scale reads and provide failover without the complexity of sharding
- Object, block, and file storage are distinct models for distinct workloads
- Specialised databases (time series, vector, search) outperform general-purpose databases for their target workloads
- Materialized views and query optimisation make existing data retrieval faster
- Connection pooling is non-optional for production applications under load
Up next
Part 1 → SQL vs NoSQL: Choosing the Right Database
We start with the foundational choice — relational vs non-relational — covering not just when to use each, but what the NoSQL category actually contains and what each type within it optimises for.
Part of the System Design series. Tags: #systemdesign #databases #distributedsystems #backend #softwarearchitecture #engineering
→ Next: SQL vs NoSQL: Choosing the Right Database — SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when No...




