Performance vs Scalability
These two terms get thrown around interchangeably, but they describe fundamentally different problems.
A system has a performance problem when it's slow for a single user. A system has a scalability problem when it's fast for a single user but falls apart under load. You can have one without the other.
Performance optimization makes individual operations faster. Scalability engineering makes the system handle more operations. The techniques often overlap, but the goals are distinct. A hand-tuned SQL query is a performance win. Sharding your database across nodes is a scalability play.
Latency vs Throughput
Latency is how long a single request takes — the time between "I asked" and "I got an answer." Throughput is how many requests the system handles per unit of time.
In most systems, you're aiming for maximum throughput with acceptable latency. These are often at odds. Batching requests improves throughput (you process more per cycle) but increases latency for individual requests (each one waits for the batch).
Where it gets tricky
Under low load, latency and throughput seem independent. Under high load, they become coupled. When throughput approaches capacity, latency spikes — requests start queuing, and each one waits longer. This is why load testing matters: your system's latency profile at 10% utilization tells you almost nothing about how it behaves at 80%.
Vertical vs Horizontal Scaling
There are two fundamental approaches to adding capacity:
Add more power to an existing machine — more CPU, RAM, faster disks.
Add more machines and distribute work across them.
Vertical scaling is appealing because your code doesn't change — you just run it on a bigger box. But there's a ceiling. Hardware has physical limits, single machines are single points of failure, and the cost curve isn't linear (a 2x bigger machine costs more than 2x the price).
Horizontal scaling is harder up front. Your application needs to handle distribution: load balancing, data partitioning, consensus. But it offers near-unlimited growth and better fault tolerance. Most serious production systems end up here.
Where Bottlenecks Hide
Systems don't scale uniformly. You'll find constraints in specific layers:
- CPU-bound: Compute-heavy work like image processing, encryption, or ML inference. More cores or faster CPUs help.
- I/O-bound: Waiting on disk, network, or database calls. Async processing, caching, and connection pooling help.
- Memory-bound: Working set exceeds available RAM, causing cache misses or swap. More memory or data partitioning helps.
- Network-bound: Bandwidth or latency between services. CDNs, compression, and co-location help.
The first job in any scalability effort is identifying which bottleneck you're actually hitting. Optimizing CPU when you're I/O-bound is wasted work.
| Approach | Best for | Watch out for |
|---|---|---|
| Vertical scaling | Quick wins, simple architectures | Hardware limits, single point of failure |
| Horizontal scaling | High availability, elastic load | Distributed complexity, data consistency |
| Caching | Read-heavy workloads | Cache invalidation, stale data |
| Async processing | Decoupling, spiky workloads | Eventual consistency, debugging difficulty |