IMPORT: plock: lower the slope of the exponential back-off

Along many tests involving both haproxy's scheduler and forwarded traffic, various exponents and algorithms were attempted for the EBO and their effects were measured. It was found that a growth in 1.25^N limited to 128k cycles consistently gives a better latency than 1.5^N limited to 256k cycles, without degrading general performance. The measures of the time to grab a write lock on a 48-thread EPYC show that the number of occurrences of low times was roughly multiplied by 2-3 while the number of occurrences of times above 64us was reduced by similar factors, to even reach 300 at 64us and limiting the maximum time by a factor of 4. The other variants that were experimented with are: m = ((m + (m >> 1)) + 2) & 0x3ffff; // original m = ((m + (m >> 1) + (m >> 3)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x1ffff; m = ((m + (m >> 1) + (m >> 4)) + 1) & 0x1ffff; m = ((m + (m >> 2) + (m >> 4)) + 1) & 0x1ffff; // lowest CPU on pl_wr test + good perf m = ((m + (m >> 2)) + 1) & 0x1ffff; // even lower cpu usage, lowest max m = ((m + (m >> 1) + (m >> 2)) + 1) & 0x1ffff; // correct but slightly higher maxes m = ((m + (m >> 1) + (m >> 3)) + 1) & 0x1ffff; // less good than m+m>>2 m = ((m + (m >> 2) + (m >> 3)) + 1) & 0x1ffff; // better but not as good as m+m>>2 m = ((m + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good, lower rates on small coounts. m = ((m + (m >> 2) + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good as well m = ((m & 0x7fff) + (m >> 1) + (m >> 4)) + 2; m = ((m & 0xffff) + (m >> 1) + (m >> 4)) + 2; This is plock commit dddd9ee01c522da33c353e2e4d4fd743d8336ec3.
2025-02-07 17:20:48 +01:00 · 2025-02-07 17:20:48 +01:00 · 253fba01a7
commit 253fba01a7
parent 9dd56da730
1 changed files with 2 additions and 2 deletions
--- a/include/import/plock.h
+++ b/include/import/plock.h
@ -107,7 +107,7 @@ static unsigned long __pl_wait_unlock_long(const unsigned long *lock, const unsi
 		 * values and still growing. This allows competing threads to
 		 * wait different times once the threshold is reached.
 		 */
-		m = ((m + (m >> 1)) + 2) & 0x3ffff;
+		m = ((m + (m >> 2)) + 1) & 0x1ffff;
 	} while (1);

 	return ret;
@ -176,7 +176,7 @@ static unsigned int __pl_wait_unlock_int(const unsigned int *lock, const unsigne
 		 * values and still growing. This allows competing threads to
 		 * wait different times once the threshold is reached.
 		 */
-		m = ((m + (m >> 1)) + 2) & 0x3ffff;
+		m = ((m + (m >> 2)) + 1) & 0x1ffff;
 	} while (1);

 	return ret;