Load Balancing — System Design Primer

What a Load Balancer Does

A load balancer sits between clients and your server fleet. Its job is simple: distribute incoming requests across multiple backend servers so no single server gets overwhelmed. It also detects unhealthy servers and stops sending them traffic.

Client

↓

Load Balancer

↙ ↓ ↘

Server 1

Server 2

Server 3

Load balancer distributes requests and removes unhealthy servers from rotation.

Layer 4 vs Layer 7

Load balancers operate at different layers of the network stack, and this distinction matters more than most people realize.

Layer 4 (Transport)

Routes based on IP address and TCP/UDP port. Doesn't inspect the request payload — just forwards the connection.

Faster (no payload inspection)
Protocol-agnostic
Can't route by URL, headers, or cookies
Example: AWS NLB, HAProxy in TCP mode

Layer 7 (Application)

Inspects HTTP request content — URL path, headers, cookies, body. Can make intelligent routing decisions.

Route /api/* to API servers
Route /static/* to CDN origin
Sticky sessions via cookies
Example: AWS ALB, Nginx, Envoy

Use L4 when you need raw throughput and don't care about request content — database connections, non-HTTP protocols, or extremely high-volume TCP traffic. Use L7 when you need content-aware routing, which is most web applications.

Routing Algorithms

Algorithm	How it works	Best for
Round Robin	Cycles through servers sequentially	Identical servers, uniform requests
Weighted Round Robin	Like round robin, but bigger servers get more requests	Mixed server sizes
Least Connections	Sends to the server with fewest active connections	Requests with varying processing time
IP Hash	Hashes client IP to consistently map to the same server	Session affinity without cookies
Random	Picks a server at random	Statistically equivalent to round robin at scale

Least connections usually wins. Round robin assumes every request takes the same amount of time. In reality, some requests are fast (cache hit) and some are slow (complex query). Least connections naturally adapts to this variance by favoring servers that finish work faster.

Reverse Proxy

A reverse proxy sits in front of your servers and forwards client requests to them. "But that's what a load balancer does?" — yes, a load balancer is a type of reverse proxy. But reverse proxies do more:

SSL termination: Handle TLS encryption/decryption so your backend servers don't have to.
Compression: Gzip/Brotli compress responses before sending to clients.
Caching: Serve cached responses for repeat requests without hitting the backend.
Security: Hide your server topology from clients. They only see the proxy's IP.
Rate limiting: Throttle abusive clients before they reach your application.

Client (HTTPS)

↓ TLS terminates here ↓

Reverse Proxy (Nginx/Envoy)

↓ plain HTTP ↓

App Server

The reverse proxy handles TLS, compression, and caching — backend servers only run app logic.

Even if you only have one backend server, a reverse proxy adds value. SSL termination alone is worth it — it keeps certificate management in one place and lets your app server focus on application logic.

Health Checks

A load balancer is only useful if it knows which servers are healthy. Health checks are periodic probes sent to each server:

Active health checks: The LB pings each server (e.g., GET /health) every few seconds. If a server fails N consecutive checks, it's removed from rotation.
Passive health checks: The LB monitors real traffic. If a server starts returning 5xx errors or timing out, it's marked unhealthy.

Your /health endpoint should check dependencies. A server that returns 200 but can't reach its database is functionally dead. Good health checks verify the server can actually serve requests — database connectivity, disk space, memory pressure. But keep it fast; a slow health check causes false positives.