From 6a9be622707a485b1b5089e29d6902dc18ebbd3f Mon Sep 17 00:00:00 2001
From: Thiago Macieira <thiago.macieira@intel.com>
Date: Wed, 7 Feb 2024 13:12:52 -0800
Subject: [PATCH] qHash: force inlining of the hash16bytes() helper

It wasn't getting inlined in aeshash256_lt32_avx256() (used by VAES +
AVX512VL variant) due to a GCC __attribute__((target())) mismatch,
causing a major loss of performance compared to the VAES + AVX2 variant.

Comparing the throughput after this fix on an Intel Core i7-1165G7
(Tiger Lake) laptop, with qHashBits modified to statically select either
[A] aeshash256() or [B] aeshash256_avx256(), out of 5 runs:

            dictionary   numbers     paths-small  uuids      longstrings
A/B (avg)   103.7%       101.1%      103.5%       104.5%     100.3%
A/B (best)  103.4%       100.9%      103.2%       103.6%     100.8%

Considering that a string representation of a UUID is 37 characters (74
bytes), neither "uuids" nor "longstrings" are directly affected by this
change. However, the overhead does change, with the aeshash256_avx256()
needing slightly fewer instructions to reach aeshash256_ge32().

Benchmarking on an Intel Xeon Scalable 4th Generation (Sapphire Rapids),
the "uuids" data set has a 10% performance loss for some reason.

Pick-to: 6.6 6.5
Change-Id: I50e2158aeade4256ad1dfffd17b1b105d3cab482
Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
(cherry picked from commit 6ab4623cad39bec935f76e366f3f262922bde94a)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
---
 src/corelib/tools/qhash.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/corelib/tools/qhash.cpp b/src/corelib/tools/qhash.cpp
index 67137135dce..1a0d281284e 100644
--- a/src/corelib/tools/qhash.cpp
+++ b/src/corelib/tools/qhash.cpp
@@ -526,7 +526,7 @@ namespace {
     // [1] https://en.wikipedia.org/wiki/Advanced_Encryption_Standard#High-level_description_of_the_algorithm
 
     // hash 16 bytes, running 3 scramble rounds of AES on itself (like label "final1")
-    static void QT_FUNCTION_TARGET(AES) QT_VECTORCALL
+    static void Q_ALWAYS_INLINE QT_FUNCTION_TARGET(AES) QT_VECTORCALL
     hash16bytes(__m128i &state0, __m128i data)
     {
         state0 = _mm_xor_si128(state0, data);
@@ -657,7 +657,7 @@ aeshash128_ge32(__m128i state0, __m128i state1, const __m128i *src, const __m128
 }
 
 #  if QT_COMPILER_SUPPORTS_HERE(VAES)
-static size_t QT_FUNCTION_TARGET(ARCH_ICL) QT_VECTORCALL
+static size_t QT_FUNCTION_TARGET(VAES_AVX512) QT_VECTORCALL
 aeshash256_lt32_avx256(__m256i state0, const uchar *p, size_t len)
 {
     __m128i state0_128 = _mm256_castsi256_si128(state0);