Why Your Load Balancer Still Sends Traffic to Dead Backends

2026-02-2323:164728singh-sanjay.com

Why zombie instances survive health checks, and what the choice between server-side and client-side load balancing means for how fast your system detects and reacts to failure.

A service reports healthy. The load balancer believes it. A request lands on it and times out. Another follows. Then ten more. By the time the system reacts, hundreds of requests have drained into a broken instance while users stared at a spinner.

Health checking sounds simple: ask if something is alive, stop sending traffic if it isn’t. In practice, the mechanism behind that check, and who performs it, determines how fast your system detects failure, how accurately it responds, and how much of that complexity leaks into your application code.

The answer is fundamentally different depending on where load balancing lives: in a central proxy, or in the client itself.

Two Models for Distributing Traffic

Before getting into health checks, it helps to be precise about what each model looks like.

A dedicated proxy sits between clients and the backend fleet. Clients know one address: the load balancer. The load balancer knows the backend pool and decides where each request goes.

Passive health check: Client A detects failure on first bad request; Client B still unaware

Passive checking has a meaningful advantage: failure detection is immediate. The first failed request triggers the response; there is no polling interval to wait through. The cost is that at least one real request must fail before the client reacts. In high-throughput systems this is usually acceptable; in low-traffic or bursty scenarios it can mean more user-visible errors.

What Each Model Gets Right

Server-side load balancing gives you a single, consistent view of fleet health. Every client gets the same routing decisions without knowing anything about the backend topology. This is operationally simple: health check configuration lives in one place, changes take effect instantly across all callers, and the backend is completely decoupled from the routing logic. At modest scale, a few dozen services and hundreds of clients, this is almost always the right default.

Client-side load balancing trades that simplicity for scale. When you have thousands of services talking to each other at high call rates, a central proxy becomes a bottleneck and a single point of failure. Removing it from the request path reduces latency and eliminates a class of infrastructure failure. Passive health checking gives clients sub-request-latency failure detection that a polling-based central proxy simply cannot match.

The cost is real: distributed health state is harder to reason about. Two clients can disagree on whether an instance is healthy. Debugging a routing anomaly requires looking at state spread across hundreds of processes rather than one. And the health check logic itself (thresholds, backoff, jitter) needs to live in every client library, tested and maintained across every language your organization uses.

Choosing Between Them

There is no universal answer. The right model depends on your fleet size, call rates, operational maturity, and how much complexity you can manage in client libraries.

Server-side load balancing is simpler to operate and reason about. For most teams and most services, it is the right starting point.

Client-side load balancing pays off when scale makes a central proxy genuinely painful: when the proxy itself becomes a bottleneck, when you need sub-millisecond failure detection, or when the overhead of a proxy hop is measurable and matters.

Many large systems end up using both: server-side load balancing at the ingress layer where clients are external and uncontrollable, and client-side load balancing for internal service-to-service calls where the client library can be standardized. The health checking story in each layer is different, the failure modes are different, and understanding both is what lets you reason clearly about where traffic actually goes when things go wrong.


Read the original article

Comments

  • By dastbe 2026-02-243:421 reply

    kind of right, kind of wrong

    * for client-side load balancing, it's entirely possible to move active healthchecking into a dedicated service and have its results be vended along with discovery. In fact, more managed server-side load balancers are also moving healthchecking out of band so they can scale the forwarding plane independently of probes.

    * for server-side load balancing, it's entirely possible to shard forwarders to avoid SPOFs, typically by creating isolated increments and then using shuffle sharding by caller/callee to minimize overlap between workloads. I think Alibaba's canalmesh whitepaper covers such an approach.

    As for scale, I think for almost everybody it's completely overblown to go with a p2p model. I think a reasonable estimate for a centralized proxy fleet is about 1% of infrastructure costs. If you want to save that, you need to have a team that can build/maintain your centralized proxy's capabilities in all the languages/frameworks your company uses, and you likely need to be build the proxy anyways for the long-tail. Whereas you can fund a much smaller team to focus on e2e ownership of your forwarding plane.

    Add on top that you need a safe deployment strategy for updating the critical logic in all of these combinations, and continuous deployment to ensure your fixes roll out to the fleet in a timely fashion. This is itself a hard scaling problem.

    • By singhsanjay12 2026-02-247:222 reply

      For client-side LB, moving active healthcheck outside into dedicated service, wouldn't it create more reliability issues with one more service to worry about? Are there any examples of this approach being used in the industry?

      • By donavanm 2026-02-256:471 reply

        IME you end up with both; something like discrete client, LB, and controller. You can’t rely on any one component to “turn itself off.“ ex a client or LB can easily get into a “wedged” state where it’s unable to take itself out of consideration for traffic. For example, I’ve had silly incidents based on bgp routes staying up, memory errors/pressure preventing new health check results from being parsed, the file systems is going read only, SKB pressure interfering with pipes, and of course, the classic difference between a dedicated health check in point versus actual traffic. All those examples it prevents the client or LB from removing itself from the traffic path.

        An external controller is able to safely remove traffic from one of the other failed components. In addition the client can still do local traffic analysis, or use in band signaling, to identify anomalous end points and remove itself or them from the traffic path.

        Good active probes are actually a pretty meaningful traffic load. It was a HUGE problem for flat virtual network models like a heroku a decade ago. This is exacerbated when you have more clients and more in points.

        As a reference, this distributed model it is what AWS moved to 15 years ago. And if you look at any of the high throughput clouds services or CDNs they’ll have a similar model.

        • By dastbe 2026-02-2520:57

          one thing to add for passive healthchecking and clientside loadbalancing is that throughput and dilution of signal really matters.

          there are obviously plenty of low/sparse call volume services where passive healthchecks would take forever to get signal, or signal is so infrequently collected its meaningless. and even with decent RPS, say 1m RPS distributed between 1000 caller replicas and 1000 callee replicas, that means that any one caller-callee pair is only seeing 1rps. Depending on your noise threshold, a centralized active healthcheck can respond much faster.

          There are some ways to improve signal in the latter case using subsetting and aggregating/reporting controllers, but that all comes with added complexity.

      • By dastbe 2026-02-250:31

        From a dataplane perspective, it does mean your healthchecks are running from a different location than your proxy. So there are risks where routability is impacted for proxy -> dest but not for healthchecker -> dest.

        For general reliability, you can create partitions of checkers and use quorum across partitions to determine what the health state is for a given dest. This also enables centralized monitoring to detect systemic issues with bad healthcheck configuration changes (i.e. are healthchecks failing because the service is unhealthy or because of a bad healthchecker?)

        In industry, I personnaly know AWS has one or two health-check-as-a-service systems that they are using internally for LBs and DNS. Uber runs its own health-check-as-a-service system which it integrates with its managed proxy fleet as well as p2p discovery. IIRC Meta also has a system like this for at least some things? But maybe I'm misremembering.

  • By dotwaffle 2026-02-243:013 reply

    I've never quite understood why there couldn't be a standardised "reverse" HTTP connection, from server to load balancer, over which connections are balanced. Standardised so that some kind of health signalling could be present for easy/safe draining of connections.

    • By singhsanjay12 2026-02-245:21

      The idea is attractive (especially for draining), but once you try to map arbitrary inbound client connections onto backend-initiated "reverse" pipes, you end up needing standardized semantics for multiplexing, backpressure, failure recovery, identity propagation, and streaming! So, you're no longer just standardizing "reverse HTTP", you’re standardizing a full proxy transport + control plane. In practice, the ecosystem standardized draining/health via readiness + LB control-plane APIs and (for HTTP/2/3) graceful shutdown signals, which solves the draining problem without flipping the fundamental accept/connect roles.

    • By bastawhiz 2026-02-243:46

      Whether the load balancer connects to the server or reverse, nothing changes. A modern H2 connection is pretty much just that: one persistent connection between the load balancer and server, who initiates it doesn't change much.

      The connection being active doesn't tell you that the server is healthy (it could hang, for instance, and you wouldn't know until the connection times out or a health check fails). Either way, you still have to send health checks, and either way you can't know between health checks that the server hasn't failed. Ultimately this has to work for every failure mode where the server can't respond to requests, and in any given state, you don't know what capabilities the server has.

    • By snowhale 2026-02-244:24

      [dead]

  • By igor47 2026-02-245:511 reply

    Back in the day, I thought about this problem domain a lot! I even wrote and open-sourced a service discovery framework called SmartStack, an early precursor to later approaches like Envoy, described here: https://medium.com/airbnb-engineering/smartstack-service-dis...

    This was a client side framework, in the OPs parlance. What's missing in OP is the insight that the server-side load balancer can also fail -- what will load balance the load balancers? We performed registration based on health checks from a sidecar, and then we also did client side checks which we called connectivity checks. Multiple client instances can disagree about the state of the world because network partitions actually can result in different states of the world for different clients.

    Finally, you do also still need circuit breakers. Health checks are generally pretty broad, and when a single endpoint in a service begins having high latency, you don't want to bring down the entire client service with all capacity stuck making requests to that one endpoint. This specific example is probably more relevant to the old days of thread and process pools than to modern evented/async frameworks, but the broader point still applies

    • By singhsanjay12 2026-02-247:16

      > when a single endpoint in a service begins having high latency

      Yes, have seen this first hand. Tracking the latency per endpoint in a sliding window helped in some way, but it created other problems for low qps services.

HackerNews