The problem with this kind of theoretical analysis is that most load balancers don't work this way, especially the typical "cloud" HTTP or TCP load balancers, which are stateless and avoid this kind of central queuing logic like the plague because it doesn't scale to their levels.
For example, most cloud load balancers I've worked with are stateless, non-queuing, and allocate work to back-ends strictly randomly.
Traditional non-cloud load balancers can implement this kind of perfect queuing, but these settings are generally off by default even when available.
Envoy, Apache, and Traefik have partial or limited support.
Conversely, most multi-threaded web server frameworks already do this by default! For example, ASP.NET has essentially an internal "load balancer" with a perfect queue if you pretend each core is a "node" and the whole server is the "scale out system".
Precisely. I was going to further add that most load balancers are network appliances and hence:
a) There must be network packet buffers on each receiver, which is a per-server "queue".
b) The optimal performance can only be achieved by pipelining requests sent to the servers, otherwise there is "dead time" caused by the network latency.
Perfect queuing to workers that act as a "slot" can only be implemented in-process on a single machine, and even there having small queues or buffers per core can improve throughput!
For example, most cloud load balancers I've worked with are stateless, non-queuing, and allocate work to back-ends strictly randomly.
Traditional non-cloud load balancers can implement this kind of perfect queuing, but these settings are generally off by default even when available.
- NetScaler: surgeProtection + maxClient=1
- F5 BIG-IP LTM: request queuing + pool/member connectionLimit=1
- HAProxy: server maxconn 1 + timeout queue
- NGINX Plus: server max_conns=1 + queue
Envoy, Apache, and Traefik have partial or limited support.
Conversely, most multi-threaded web server frameworks already do this by default! For example, ASP.NET has essentially an internal "load balancer" with a perfect queue if you pretend each core is a "node" and the whole server is the "scale out system".