A service reports healthy. The load balancer believes it. A request lands on it and times out. Another follows. Then ten more. By the time the system reacts, hundreds of requests have drained into a broken instance while users stared at a spinner.
Health checking sounds simple: ask if something is alive, stop sending traffic if it isn’t. In practice, the mechanism behind that check, and who performs it, determines how fast your system detects failure, how accurately it responds, and how much of that complexity leaks into your application code.
The answer is fundamentally different depending on where load balancing lives: in a central proxy, or in the client itself.
Two Models for Distributing Traffic
Before getting into health checks, it helps to be precise about what each model looks like.
A dedicated proxy sits between clients and the backend fleet. Clients know one address: the load balancer. The load balancer knows the backend pool and decides where each request goes.
Passive checking has a meaningful advantage: failure detection is immediate. The first failed request triggers the response; there is no polling interval to wait through. The cost is that at least one real request must fail before the client reacts. In high-throughput systems this is usually acceptable; in low-traffic or bursty scenarios it can mean more user-visible errors.
What Each Model Gets Right
Server-side load balancing gives you a single, consistent view of fleet health. Every client gets the same routing decisions without knowing anything about the backend topology. This is operationally simple: health check configuration lives in one place, changes take effect instantly across all callers, and the backend is completely decoupled from the routing logic. At modest scale, a few dozen services and hundreds of clients, this is almost always the right default.
Client-side load balancing trades that simplicity for scale. When you have thousands of services talking to each other at high call rates, a central proxy becomes a bottleneck and a single point of failure. Removing it from the request path reduces latency and eliminates a class of infrastructure failure. Passive health checking gives clients sub-request-latency failure detection that a polling-based central proxy simply cannot match.
The cost is real: distributed health state is harder to reason about. Two clients can disagree on whether an instance is healthy. Debugging a routing anomaly requires looking at state spread across hundreds of processes rather than one. And the health check logic itself (thresholds, backoff, jitter) needs to live in every client library, tested and maintained across every language your organization uses.
Choosing Between Them
There is no universal answer. The right model depends on your fleet size, call rates, operational maturity, and how much complexity you can manage in client libraries.
Server-side load balancing is simpler to operate and reason about. For most teams and most services, it is the right starting point.
Client-side load balancing pays off when scale makes a central proxy genuinely painful: when the proxy itself becomes a bottleneck, when you need sub-millisecond failure detection, or when the overhead of a proxy hop is measurable and matters.
Many large systems end up using both: server-side load balancing at the ingress layer where clients are external and uncontrollable, and client-side load balancing for internal service-to-service calls where the client library can be standardized. The health checking story in each layer is different, the failure modes are different, and understanding both is what lets you reason clearly about where traffic actually goes when things go wrong.
