Load Balancing

How to distribute traffic across servers, the difference between L4 and L7, and why reverse proxies exist.

What a Load Balancer Does

A load balancer sits between clients and your server fleet. Its job is simple: distribute incoming requests across multiple backend servers so no single server gets overwhelmed. It also detects unhealthy servers and stops sending them traffic.

Client
Client
Client
Load Balancer
Server 1
Server 2
Server 3
Load balancer distributes requests and removes unhealthy servers from rotation.

Layer 4 vs Layer 7

Load balancers operate at different layers of the network stack, and this distinction matters more than most people realize.

Layer 4 (Transport)

Routes based on IP address and TCP/UDP port. Doesn't inspect the request payload — just forwards the connection.

  • Faster (no payload inspection)
  • Protocol-agnostic
  • Can't route by URL, headers, or cookies
  • Example: AWS NLB, HAProxy in TCP mode
Layer 7 (Application)

Inspects HTTP request content — URL path, headers, cookies, body. Can make intelligent routing decisions.

  • Route /api/* to API servers
  • Route /static/* to CDN origin
  • Sticky sessions via cookies
  • Example: AWS ALB, Nginx, Envoy

Use L4 when you need raw throughput and don't care about request content — database connections, non-HTTP protocols, or extremely high-volume TCP traffic. Use L7 when you need content-aware routing, which is most web applications.

Routing Algorithms

Algorithm How it works Best for
Round Robin Cycles through servers sequentially Identical servers, uniform requests
Weighted Round Robin Like round robin, but bigger servers get more requests Mixed server sizes
Least Connections Sends to the server with fewest active connections Requests with varying processing time
IP Hash Hashes client IP to consistently map to the same server Session affinity without cookies
Random Picks a server at random Statistically equivalent to round robin at scale
Least connections usually wins. Round robin assumes every request takes the same amount of time. In reality, some requests are fast (cache hit) and some are slow (complex query). Least connections naturally adapts to this variance by favoring servers that finish work faster.

Reverse Proxy

A reverse proxy sits in front of your servers and forwards client requests to them. "But that's what a load balancer does?" — yes, a load balancer is a type of reverse proxy. But reverse proxies do more:

  • SSL termination: Handle TLS encryption/decryption so your backend servers don't have to.
  • Compression: Gzip/Brotli compress responses before sending to clients.
  • Caching: Serve cached responses for repeat requests without hitting the backend.
  • Security: Hide your server topology from clients. They only see the proxy's IP.
  • Rate limiting: Throttle abusive clients before they reach your application.
Client (HTTPS)
↓ TLS terminates here ↓
Reverse Proxy (Nginx/Envoy)
↓ plain HTTP ↓
App Server
App Server
The reverse proxy handles TLS, compression, and caching — backend servers only run app logic.

Even if you only have one backend server, a reverse proxy adds value. SSL termination alone is worth it — it keeps certificate management in one place and lets your app server focus on application logic.

Health Checks

A load balancer is only useful if it knows which servers are healthy. Health checks are periodic probes sent to each server:

  • Active health checks: The LB pings each server (e.g., GET /health) every few seconds. If a server fails N consecutive checks, it's removed from rotation.
  • Passive health checks: The LB monitors real traffic. If a server starts returning 5xx errors or timing out, it's marked unhealthy.
Your /health endpoint should check dependencies. A server that returns 200 but can't reach its database is functionally dead. Good health checks verify the server can actually serve requests — database connectivity, disk space, memory pressure. But keep it fast; a slow health check causes false positives.