BUG/MAJOR: listeners: transfer connection accounting when switching listeners

Since we made it possible for a bind_conf to listen to multiple thread
groups with shards in 2.8 with commit 9d360604bd ("MEDIUM: listener:
rework thread assignment to consider all groups"), the per-listener
connection count was not properly transferred to the target listener
with the connection when switching to another thread group. This results
in one listener possibly reaching high values and another one possibly
reaching negative values. Usually it's not visible, unless a maxconn is
set on the bind_conf, in which case comparisons will quickly put an end
to the willingness to accept new connections.

This problem only happens when thread groups are enabled, and it seems
very hard to trigger it normally, it only impacts sockets having a single
shard, hence currently the CLI (or any conf with "bind ... shards 1"),
where it can be reproduced with a config having a very low "maxconn" on
the stats socket directive (here, 4), and issuing a few tens of
socat <<< "show activity" in parallel, or sending HTTP connections to a
single-shared listener. Very quickly, haproxy stops accepting connections
and eats CPU in the poller which tries to get its connections accepted.

A BUG_ON(l->nbconn<0) after HA_ATOMIC_DEC() in listener_release() also
helps spotting them better.

Many thanks to Christian Ruppert who once again provided a very accurate
report in GH #2951 with the required data permitting this analysis.

This fix must be backported to 2.8.
This commit is contained in:
Willy Tarreau 2025-04-25 18:32:02 +02:00
parent 9240cd4a27
commit f1064c7382

View File

@ -1488,6 +1488,11 @@ void listener_accept(struct listener *l)
*/
ring = &accept_queue_rings[t];
if (accept_queue_push_mp(ring, cli_conn, bind_tid_commit)) {
if (new_li) {
_HA_ATOMIC_INC(&new_li->nbconn);
_HA_ATOMIC_DEC(&l->nbconn);
}
_HA_ATOMIC_INC(&activity[t].accq_pushed);
tasklet_wakeup(ring->tasklet);