MINOR: http: add a function to validate characters of :authority

As discussed here:
  https://github.com/httpwg/http2-spec/pull/936
  https://github.com/haproxy/haproxy/issues/2941

It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/').

This patch adds a specific function which was checks all such characters
and their ranges on an ist, and benefits from modern compilers
optimizations that arrange the comparisons into an evaluation tree for
faster match. That's the version that gave the most consistent performance
across various compilers, though some hand-crafted versions using bitmaps
stored in register could be slightly faster but super sensitive to code
ordering, suggesting that the results might vary with future compilers.
This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per
char on avg). The resulting impact on H2 request processing time (small
requests) was measured around 0.3%, from 6.60 to 6.618us per request,
which is a bit high but remains acceptable given that the test only
focused on req rate.

The code was made usable both for H2 and H3.
This commit is contained in:
Willy Tarreau 2025-05-12 17:39:08 +02:00
parent c40d6ac840
commit ebab479cdf

View File

@ -232,6 +232,52 @@ static inline int http_path_has_forbidden_char(const struct ist ist, const char
return 0;
}
/* Checks whether the :authority pseudo header contains dangerous chars that
* might affect its reassembly. We want to catch anything below 0x21, above
* 0x7e, as well as '@', '[', ']', '/','?', '#', '\', CR, LF, NUL. Then we
* fall back to the slow path and decide. Brackets are used for IP-literal and
* deserve special case, that is better handled in the slow path. The function
* returns 0 if no forbidden char is presnet, non-zero otherwise.
*/
static inline int http_authority_has_forbidden_char(const struct ist ist)
{
size_t ofs, len = istlen(ist);
const char *p = istptr(ist);
int brackets = 0;
uchar c;
/* Many attempts with various methods have shown that moderately recent
* compilers (gcc >= 9, clang >= 13) will arrange the code below as an
* evaluation tree that remains efficient at -O2 and above (~1.2ns per
* char). The immediate next efficient one is the bitmap from 64-bit
* registers but it's extremely sensitive to code arrangements and
* optimization.
*/
for (ofs = 0; ofs < len; ofs++) {
c = p[ofs];
if (unlikely(c < 0x21 || c > 0x7e ||
c == '#' || c == '/' || c == '?' || c == '@' ||
c == '[' || c == '\\' || c == ']')) {
/* all of them must be rejected, except '[' which may
* only appear at the beginning, and ']' which may
* only appear at the end or before a colon.
*/
if ((c == '[' && ofs == 0) ||
(c == ']' && (ofs == len - 1 || p[ofs + 1] == ':'))) {
/* that's an IP-literal (see RFC3986#3.2), it's
* OK for now.
*/
brackets ^= 1;
} else {
return 1;
}
}
}
/* there must be no opening bracket left nor lone closing one */
return brackets;
}
/* Checks status code array <array> for the presence of status code <status>.
* Returns non-zero if the code is present, zero otherwise. Any status code is
* permitted.