Scaling

What it is

Scaling is a system’s ability to handle increasing load — more users, more requests, more data. The two primary strategies are vertical scaling (making individual machines more powerful) and horizontal scaling (adding more machines).

Vertical scaling adds CPU, RAM, or disk to existing machines. Simple to manage, no architectural changes required, but has a hard ceiling and creates single-point-of-failure risk.

Horizontal scaling adds machines. Theoretically unbounded capacity, but introduces complexity: state coordination, network partitioning, and fault tolerance become active concerns.

Why it matters

Horizontal scaling is often presented as the default correct answer, but the key constraint is that stateful components — databases and caches — do not scale the same way stateless components do. Stateless API servers can be replicated freely; stateful systems require coordination that limits throughput.

The practical implication: scale bottom-up. A stateless API tier sitting in front of a saturated DB gains nothing from adding more API servers. Fix the bottleneck where it actually is.

Evidence & examples

From system-design-masterclass-01: vertical scaling is the “hulk” strategy (powerful, limited); horizontal is “minions” (many, distributed, complex). The bottleneck is always the stateful layer — API servers are easy to replicate, databases and caches are not.

DDIA’s treatment of partitioning and replication is the deeper treatment of the same problem — both are mechanisms for horizontal scaling of stateful data, each with specific trade-offs around consistency and availability.

Tensions & counterarguments

Scaling API servers when the DB is the bottleneck is waste. The instinct to add more servers is often wrong.
“Infinite horizontal scaling” is a simplification — horizontal scaling of stateful systems requires partitioning, replication, and coordination strategies, each introducing consistency trade-offs (see linearizability, eventual-consistency).

delegation — offloading work asynchronously reduces synchronous demand; a complement to capacity scaling
partitioning — DDIA’s treatment of horizontal data distribution
replication — DDIA’s treatment of data copies across machines
system-design

Notes

Explorer

Scaling

Scaling

What it is

Why it matters

Evidence & examples

Tensions & counterarguments

Graph View

Table of Contents

Backlinks

Notes

Explorer

Scaling

Scaling

What it is

Why it matters

Evidence & examples

Tensions & counterarguments

Related

Graph View

Table of Contents

Backlinks