qHash: force inlining of the hash16bytes() helper
It wasn't getting inlined in aeshash256_lt32_avx256() (used by VAES + AVX512VL variant) due to a GCC __attribute__((target())) mismatch, causing a major loss of performance compared to the VAES + AVX2 variant. Comparing the throughput after this fix on an Intel Core i7-1165G7 (Tiger Lake) laptop, with qHashBits modified to statically select either [A] aeshash256() or [B] aeshash256_avx256(), out of 5 runs: dictionary numbers paths-small uuids longstrings A/B (avg) 103.7% 101.1% 103.5% 104.5% 100.3% A/B (best) 103.4% 100.9% 103.2% 103.6% 100.8% Considering that a string representation of a UUID is 37 characters (74 bytes), neither "uuids" nor "longstrings" are directly affected by this change. However, the overhead does change, with the aeshash256_avx256() needing slightly fewer instructions to reach aeshash256_ge32(). Benchmarking on an Intel Xeon Scalable 4th Generation (Sapphire Rapids), the "uuids" data set has a 10% performance loss for some reason. Pick-to: 6.6 6.5 Change-Id: I50e2158aeade4256ad1dfffd17b1b105d3cab482 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io> (cherry picked from commit 6ab4623cad39bec935f76e366f3f262922bde94a) Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
This commit is contained in:
parent
7ad4db5ee9
commit
6a9be62270
@ -526,7 +526,7 @@ namespace {
|
||||
// [1] https://en.wikipedia.org/wiki/Advanced_Encryption_Standard#High-level_description_of_the_algorithm
|
||||
|
||||
// hash 16 bytes, running 3 scramble rounds of AES on itself (like label "final1")
|
||||
static void QT_FUNCTION_TARGET(AES) QT_VECTORCALL
|
||||
static void Q_ALWAYS_INLINE QT_FUNCTION_TARGET(AES) QT_VECTORCALL
|
||||
hash16bytes(__m128i &state0, __m128i data)
|
||||
{
|
||||
state0 = _mm_xor_si128(state0, data);
|
||||
@ -657,7 +657,7 @@ aeshash128_ge32(__m128i state0, __m128i state1, const __m128i *src, const __m128
|
||||
}
|
||||
|
||||
# if QT_COMPILER_SUPPORTS_HERE(VAES)
|
||||
static size_t QT_FUNCTION_TARGET(ARCH_ICL) QT_VECTORCALL
|
||||
static size_t QT_FUNCTION_TARGET(VAES_AVX512) QT_VECTORCALL
|
||||
aeshash256_lt32_avx256(__m256i state0, const uchar *p, size_t len)
|
||||
{
|
||||
__m128i state0_128 = _mm256_castsi256_si128(state0);
|
||||
|
Loading…
x
Reference in New Issue
Block a user