Series: System Design · Data & Storage — Pillar 4 of 8

Systems Design

#	Post	What it covers
00	Data & Storage: Where Everything Lives	Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars)
01	SQL vs NoSQL: Choosing the Right Database	SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call.
02	Database Indexing: The Highest-Leverage Performance Tool	Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one.
03	B-Trees & B+ Trees: The Data Structure Behind Database Indexes	Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries.
04	LSM Trees: Why Some Databases Are Built for Writes	LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it.
05	Denormalisation: Trading Storage for Speed	Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely.
06	Database Sharding: Scaling Beyond a Single Node	Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces.
07	Data Partitioning: Choosing How to Divide Your Data	Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale.
08	Consistent Hashing: Minimising Resharding Pain	Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems.
09	Replication & Read Replicas: Scaling Reads and Surviving Failures	Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them.
10	Object Storage: Unlimited Scale for Large Binary Data ← you are here	Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it.
11	Block vs File vs Object Storage: Three Models, Three Use Cases	Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them.
12	Distributed File Systems: File Storage Across Many Machines	Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them.
13	Time Series Databases: Built for Metrics and Events	Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB.
14	Vector Databases: Semantic Search and AI Memory	Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one.
15	Full-Text Search Engines: Beyond SQL LIKE	Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful.
16	Materialized Views: Pre-Computing Expensive Queries	Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches.
17	Query Optimisation: From Slow to Fast	Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast.
18	Connection Pooling: Managing the Hidden Bottleneck	Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly.
19	Data & Storage: Wrap-Up	A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system.

Object Storage: Unlimited Scale for Large Binary Data

The problem

Your URL shortener now lets users upload custom QR codes and thumbnail images for their links. You store them on the web server's local filesystem under /var/uploads/. Simple, fast, works fine.

Then you add a second web server for load balancing. Now uploads on server A aren't visible to requests routed to server B. You add a shared NFS mount. Under load, NFS latency spikes and file operations time out. You add more application servers. Now you're managing distributed filesystem mounts across a growing fleet.

None of this is the actual problem you were trying to solve. The actual problem is: where does binary data live in a system with multiple application servers?

The answer that the industry converged on twenty years ago is object storage. Amazon S3 launched in 2006. Today it stores over 350 trillion objects. The model it introduced became the de facto standard for storing any large binary data at scale.

The core idea

Object storage treats every file as a discrete, immutable object identified by a unique key. Objects are stored in flat namespaces called buckets. There's no directory hierarchy, no in-place editing, no filesystem semantics. You put an object (upload), you get an object (download), you delete an object — that's the entire interface.

In exchange for this simplicity, you get effectively unlimited scale, built-in redundancy, global accessibility via HTTP, and a pricing model based on storage used rather than server capacity provisioned.

The analogy: a postal warehouse

A traditional file server is like a filing cabinet — you know which drawer holds which folder, and you update files in place. If the cabinet is full, you buy a bigger one. If it's in one office, people in another office can't reach it directly.

Object storage is like a postal warehouse. You hand over a package (the object). The warehouse gives you a tracking number (the key). The warehouse stores the package however it likes across thousands of shelves and warehouses — you never know or care where. When you need the package, you give the tracking number and receive it. There's no "update package in place" — you send a new package and get a new tracking number.

The warehouse can hold as many packages as needed. Adding packages doesn't require you to buy more cabinets. Packages are accessible from anywhere in the world. You pay only for how much space your packages occupy.

How it works

The data model

Object storage has three concepts:

Object: a file plus its metadata. The object's content is immutable after upload — you cannot append to an S3 object or modify it in place. To update, you upload a new version and (optionally) delete the old one. Objects can range from 1 byte to 5TB.

Key: the unique identifier for an object within a bucket. Keys look like paths — user_uploads/user_123/qr-x7Kp2.png — but they're strings, not filesystem paths. Object storage has no real directories; the / in keys is convention, and most UIs display keys with slashes as if they were directories.

Bucket: a flat container for objects. Buckets have names that are globally unique within the storage provider. A bucket holds an unlimited number of objects.

The HTTP API

Every major object storage system exposes an HTTP API. S3's API became the de facto standard — MinIO, Cloudflare R2, Google Cloud Storage, and most other object stores implement S3-compatible APIs.

# Upload an object
PUT https://bucket-name.s3.amazonaws.com/user_uploads/user_123/qr-x7Kp2.png
Content-Type: image/png
[binary data]

# Download an object
GET https://bucket-name.s3.amazonaws.com/user_uploads/user_123/qr-x7Kp2.png

# Delete an object
DELETE https://bucket-name.s3.amazonaws.com/user_uploads/user_123/qr-x7Kp2.png

# List objects with a prefix
GET https://bucket-name.s3.amazonaws.com?prefix=user_uploads/user_123/

Applications upload directly to S3; objects are served to end users either directly from S3 URLs or via a CDN configured to use S3 as an origin.

Durability and availability

S3 stores each object redundantly across multiple availability zones. Standard S3 offers 99.999999999% (eleven nines) durability — losing an object requires multiple independent failures in multiple data centres simultaneously. The probability is so low it's essentially theoretical.

Availability (can you read the object right now?) is typically 99.9% to 99.99%. There's no single node to fail — requests are distributed across a large fleet.

Presigned URLs

A common pattern: instead of routing user uploads through your application servers (slow, expensive, bandwidth bottleneck), generate a presigned URL and give it directly to the client. The client uploads directly to S3.

# Application server generates a presigned URL
presigned_url = s3_client.generate_presigned_url(
    'put_object',
    Params={'Bucket': 'my-bucket', 'Key': 'uploads/user_123/qr.png'},
    ExpiresIn=300  # valid for 5 minutes
)

# Return presigned_url to client
# Client PUTs directly to S3 — no upload traffic through application servers

Similarly, presigned GET URLs allow time-limited access to private objects without making them publicly readable.

Storage classes and lifecycle policies

Not all objects need the same access speed or cost profile. S3 offers tiered storage classes:

S3 Standard: high availability, low latency, highest cost. For frequently accessed objects.
S3 Infrequent Access: lower storage cost, small retrieval fee. For objects accessed monthly.
S3 Glacier: very low storage cost, minutes-to-hours retrieval. For archives and backups.

Lifecycle policies automatically transition objects between classes based on age:

// Transition to IA after 30 days, Glacier after 90 days
{
  "Rules": [{
    "Status": "Enabled",
    "Filter": {"Prefix": "logs/"},
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"}
    ]
  }]
}

The URL shortener's click logs can be archived to Glacier after 90 days — paying pennies per GB for years of historical data.

Versioning

Buckets can be configured to retain all versions of each object. A PUT to an existing key creates a new version; the old version is retained. DELETE creates a "delete marker" but doesn't remove previous versions. This provides protection against accidental deletion and enables point-in-time recovery.

The tradeoffs

No in-place updates. You cannot append to an object or modify part of it. Updating a 10GB file means re-uploading 10GB. This is fine for most large-file use cases (images, video, backups) but a poor fit for data that's frequently modified in small increments.

Eventual consistency for list operations. S3 is strongly consistent for object reads and writes as of 2020. List operations (listing objects in a bucket) may occasionally return stale results. For most applications this is irrelevant; for applications that depend on precise listings immediately after writes, it requires care.

Retrieval cost and latency. Reading from S3 is fast (typically under 100ms for small objects via CDN, seconds for large multi-GB files) but not as fast as local disk for sequential reads. Glacier retrieval takes minutes to hours. Applications with strict latency requirements for large files need a CDN in front of S3.

Not a database. Object storage has no query capabilities — you cannot filter objects by content or metadata fields. Finding all QR codes created by a user requires either a predictable key structure (prefix by user ID) or a separate metadata store (a database row per object with the key stored there).

When to use object storage

Use object storage for:

User-uploaded files (images, documents, videos)
Application logs and audit trails
Database backups
Static web assets (JavaScript, CSS, images) — serve via CDN with S3 as origin
Artifacts (build outputs, ML model weights, data exports)
Any large binary data that doesn't need in-place modification

Don't use object storage for:

Data that needs frequent small updates (use a database or block storage)
Low-latency key-value lookups (use a key-value store)
File systems that need POSIX semantics (use block or file storage)
Structured data that needs querying (use a database)

The one thing to remember

Object storage is the right home for any large binary data your application generates or receives. It's unlimited in scale, highly durable, globally accessible via HTTP, and costs orders of magnitude less than storing large files in a database or on local disk. The trade is immutability — you can't modify objects in place — which is usually fine for binary content that's created once and read many times.

← Previous: Replication & Read Replicas: Scaling Reads and Surviving Failures — Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and...

→ Next: Block vs File vs Object Storage: Three Models, Three Use Cases — Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to c...

Object Storage: Unlimited Scale for Large Binary Data

Systems Design

Object Storage: Unlimited Scale for Large Binary Data

The problem

The core idea

The analogy: a postal warehouse

How it works

The data model

The HTTP API

Durability and availability

Presigned URLs

Storage classes and lifecycle policies

Versioning

The tradeoffs

When to use object storage

The one thing to remember

Comments

Systems Design

More from this blog

Architecture Patterns: Wrap-Up

MapReduce: Processing Petabytes in Parallel

Batch vs Stream Processing: How Fresh Do Your Answers Need to Be?

ETL Pipelines: Moving Data from Operations to Analytics

Backend for Frontend: One API Per Client Type

Command Palette

Systems Design

Object Storage: Unlimited Scale for Large Binary Data

The problem

The core idea

The analogy: a postal warehouse

How it works

The data model

The HTTP API

Durability and availability

Presigned URLs

Storage classes and lifecycle policies

Versioning

The tradeoffs

When to use object storage

The one thing to remember

Comments

Systems Design

More from this blog