QNumeric: make the overflowing helpers constexpr

The qAdd/Sub/MulOverflow functions are useful to have in constexpr code. In fact, they all already have some generic and constexpr-friendly implementations (but see below). The only issue is that these functions form a rather convoluted overload set, which dispatch to different implementations (builtin, intrinsics) depending on the compiler's capabilities. Instead, centralize the implementation of the functions, which allows us to detect if they are being called from constant evaluation and fall back to the generic code in that case. * On GCC, Clang, ICC-as-Clang: call the compiler builtin; it's constexpr on all the supported versions. In fact, make the detection more accurate: the old code excluded all Clang versions (incl. ICC-as-Clang) on 32-bit from using the builtins, including for addition and subtraction. That seems unnecessary, as they work just fine. The problem was that, for 64-bit arguments, Clang would generate a call to __mulodi4, and that failed to link. We can just special-case that; moreover, Clang 14 fixed this issue [1]. Therefore, detect properly the compiler, incl. the Clang version. * Otherwise, if we're in constant evaluation, call the generic implementation. This poses a problem: on platforms without native 128-bit support, we are lacking an implementation of qMulOverflow when multiplying 64 bit quantities. In general, we did not guarantee its availability, but this regresses MSVC, which doesn't support __int128 but instead relied entirely on its intrinsics to perform 128-bit multiplications. To cope with this, add an implementation of a wide multiplication. * Otherwise, use intrinsics (which are not constexpr). * Otherwise, use the generic code. There's a hair in this soup: INTEGRITY seems to only have non-constexpr builtin (intrinsics), and it's also unable to detect if we're in constant evaluation. This means that, during constant evaluation, we're going to hit an error (call to the non-constexpr intrinsic). Getting rid of the intrinsic and forcing the usage of the generic version is tempting, but for now, these functions will simply not be constexpr there. A few drive-by fixes: * The aforementioned detection of Clang, so it uses the intrinsics; * Use Q_CC_GHS instead of Q_OS_INTEGRITY * A couple of codepaths were checking if we are in two's complement, and doing different things otherwise. There is no need for the check, we never supported anything but two's complement. [1] https://godbolt.org/z/v4z6W7qbd [ChangeLog][QtCore][QtNumeric] The q*Overflow family of functions are now callable from constant expressions on certain compilers. Done-with: Thiago Change-Id: I9fa2ab50a02b31358a916656e5b3329d77344efe Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2025-01-29 10:22:50 +01:00 · 2025-01-29 10:22:50 +01:00 · a74d283216
commit a74d283216
parent cbea57f80b
2 changed files with 555 additions and 127 deletions
--- a/src/corelib/global/qnumeric.h
+++ b/src/corelib/global/qnumeric.h
@ -1,4 +1,5 @@
 // Copyright (C) 2021 The Qt Company Ltd.
+// Copyright (C) 2025 Klarälvdalens Datakonsult AB, a KDAB Group company, info@kdab.com, author Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
 // SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only

 #ifndef QNUMERIC_H
@ -15,7 +16,7 @@

 #include <cmath>
 #include <limits>
-#include <type_traits>
+#include <QtCore/q20type_traits.h>

 // min() and max() may be #defined by windows.h if that is included before, but we need them
 // for std::numeric_limits below. You should not use the min() and max() macros, so we just #undef.
@ -37,20 +38,13 @@
 #  endif
 #  if defined(Q_PROCESSOR_X86_64) || defined(Q_PROCESSOR_ARM_64)
 #    define Q_INTRINSIC_MUL_OVERFLOW64
-#    define Q_UMULH(v1, v2) __umulh(v1, v2);
-#    define Q_SMULH(v1, v2) __mulh(v1, v2);
+#    define Q_UMULH(v1, v2) __umulh(v1, v2)
+#    define Q_SMULH(v1, v2) __mulh(v1, v2)
 #    pragma intrinsic(__umulh)
 #    pragma intrinsic(__mulh)
 #  endif
 #endif

-# if defined(Q_OS_INTEGRITY) && defined(Q_PROCESSOR_ARM_64)
-#  include <arm64_ghs.h>
-#  define Q_INTRINSIC_MUL_OVERFLOW64
-#  define Q_UMULH(v1, v2) __MULUH64(v1, v2);
-#  define Q_SMULH(v1, v2) __MULSH64(v1, v2);
-#endif
-
 QT_BEGIN_NAMESPACE

 // To match std::is{inf,nan,finite} functions:
@ -94,41 +88,212 @@ Q_CORE_EXPORT quint64 qFloatDistance(double a, double b);
 // size_t. Implementations for 8- and 16-bit types will work but may not be as
 // efficient. Implementations for 64-bit may be missing on 32-bit platforms.

-#if (Q_CC_GNU >= 500 || __has_builtin(__builtin_add_overflow)) \
-    && !(QT_POINTER_SIZE == 4 && defined(Q_CC_CLANG))
-// GCC 5 and Clang 3.8 have builtins to detect overflows
-// 32 bit Clang has the builtins but tries to link a library which hasn't
-#define Q_INTRINSIC_MUL_OVERFLOW64
+// All the GCC and Clang versions we support have constexpr
+// builtins for overflowing arithmetic.
+#if defined(Q_CC_GNU_ONLY) \
+    || defined(Q_CC_CLANG_ONLY) \
+    || __has_builtin(__builtin_add_overflow)
+# define Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS
+// On 32-bit, Clang < 14 will fail to link if multiplying 64-bit
+// quantities (emits an unresolved call to __mulodi4), so we can't use
+// the builtin in that case.
+# if !(QT_POINTER_SIZE == 4 && defined(Q_CC_CLANG_ONLY) && Q_CC_CLANG_ONLY < 1400)
+#  define Q_INTRINSIC_MUL_OVERFLOW64
+# endif
+#endif

-template <typename T> inline
-typename std::enable_if_t<std::is_unsigned_v<T> || std::is_signed_v<T>, bool>
-qAddOverflow(T v1, T v2, T *r)
-{ return __builtin_add_overflow(v1, v2, r); }
-
-template <typename T> inline
-typename std::enable_if_t<std::is_unsigned_v<T> || std::is_signed_v<T>, bool>
-qSubOverflow(T v1, T v2, T *r)
-{ return __builtin_sub_overflow(v1, v2, r); }
-
-template <typename T> inline
-typename std::enable_if_t<std::is_unsigned_v<T> || std::is_signed_v<T>, bool>
-qMulOverflow(T v1, T v2, T *r)
-{ return __builtin_mul_overflow(v1, v2, r); }
-
-#else
-// Generic implementations
-
-template <typename T> inline typename std::enable_if_t<std::is_unsigned_v<T>, bool>
-qAddOverflow(T v1, T v2, T *r)
+namespace QtPrivate {
+// Generic versions of (some) overflowing math functions, private API.
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_unsigned_v<T>, bool>
+qAddOverflowGeneric(T v1, T v2, T *r)
 {
    // unsigned additions are well-defined
    *r = v1 + v2;
    return v1 > T(v1 + v2);
 }

-template <typename T> inline typename std::enable_if_t<std::is_signed_v<T>, bool>
+// Wide multiplication.
+// It has been isolated in its own function so that it can be tested.
+// Note that this implementation requires a T that doesn't undergo
+// promotions.
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_same_v<T, decltype(+T{})>, bool>
+qMulOverflowWideMultiplication(T v1, T v2, T *r)
+{
+    // This is a glorified long/school-grade multiplication,
+    // that considers each input of N bits as two halves of N/2 bits:
+    //
+    // v1 = 2^(N/2) * v1_hi + v1_lo
+    // v2 = 2^(N/2) * v2_hi + v2_lo
+    //
+    // Therefore, v1*v2 = 2^N     * v1_hi * v2_hi +
+    //                    2^(N/2) * v1_hi * v2_lo +
+    //                    2^(N/2) * v1_lo * v2_hi +
+    //                            * v1_lo * v2_lo
+    //
+    // Using the N bits of precision we have we can perform the hi*lo
+    // multiplications safely; that is never going to overflow.
+    //
+    // Then we can sum together these partial results:
+    //
+    //                 [ v1_hi | v1_lo ] *
+    //                 [ v2_hi | v2_lo ] =
+    //                 -------------------
+    //                 [ v1_lo * v2_lo ] +
+    //         [ v1_hi * v2_lo ]         +  // shifted because it's * 2^(N/2)
+    //         [ v2_hi * v1_lo ]         +  // shifted because it's * 2^(N/2)
+    // [ v1_hi * v2_hi ]                 =  // shifted because it's * 2^N
+    // -------------------------------
+    // [     high     ][     low       ]    // exact result (in 2^(2N) bits)
+    //
+    // ... except that this way we'll need to bring some carries, so
+    // we'll do a slightly smarter sum.
+    //
+    // We need high for detecting overflows, even if we are not returning it.
+
+    // Get multiplication by zero out of the way
+    if (v1 == 0 || v2 == 0) {
+        *r = T(0);
+        return false;
+    }
+
+    // Extract the absolute values as unsigned
+    // (will fix the sign later)
+    using U = std::make_unsigned_t<T>;
+    const U v1_abs = (v1 >= 0) ? U(v1) : (U(0) - U(v1));
+    const U v2_abs = (v2 >= 0) ? U(v2) : (U(0) - U(v2));
+
+    // Masks for N/2 bits
+    constexpr std::size_t half_width = (sizeof(U) * 8) / 2;
+    const U half_mask = ~U(0) >> half_width;
+
+    // Split in low and half quantities
+    const U v1_lo = v1_abs & half_mask;
+    const U v1_hi = v1_abs >> half_width;
+    const U v2_lo = v2_abs & half_mask;
+    const U v2_hi = v2_abs >> half_width;
+
+    // Cross-product; this will never overflow
+    const U lo_lo = v1_lo * v2_lo;
+    const U lo_hi = v1_lo * v2_hi;
+    const U hi_lo = v1_hi * v2_lo;
+    const U hi_hi = v1_hi * v2_hi;
+
+    // We could sum directly the cross-products, but then we'd have to
+    // keep track of carries. This avoids it.
+    const U tmp = (lo_lo >> half_width) + (hi_lo & half_mask) + lo_hi;
+    U result_hi = (hi_lo >> half_width) + (tmp >> half_width) + hi_hi;
+    U result_lo = (tmp << half_width) | (lo_lo & half_mask);
+
+    if constexpr (std::is_unsigned_v<T>) {
+        // If the source was unsigned, we're done; a non-zero high
+        // signals overflow.
+        *r = result_lo;
+        return result_hi != U(0);
+    } else {
+        // We need to set the correct sign back, and check for overflow.
+        const bool isNegative = (v1 < T(0)) != (v2 < T(0));
+        if (isNegative) {
+            // Result is negative; calculate two's complement of the
+            // [high, low] pair, by inverting the bits and adding 1,
+            // which is equivalent to negating it in unsigned
+            // arithmetic.
+            // This operation should be done on the pair as a whole,
+            // but we have the individual components, so start by
+            // calculating two's complement of low:
+            result_lo = U(0) - result_lo;
+
+            // If result_lo is 0, it means that the addition of 1 into
+            // it has overflown, so now we have a carry to add into the
+            // inverted high:
+            result_hi = ~result_hi;
+            if (result_lo == 0)
+                result_hi += U(1);
+        }
+
+        *r = result_lo;
+        // Overflow has happened if result_hi is not a sign extension
+        // of the sign bit of result_lo. Note the usage of T, not U.
+        return result_hi != U(*r >> std::numeric_limits<T>::digits);
+    }
+}
+
+template <typename T, typename Enable = void>
+constexpr inline bool HasLargerInt = false;
+template <typename T>
+constexpr inline bool HasLargerInt<T, std::void_t<typename QIntegerForSize<sizeof(T) * 2>::Unsigned>> = true;
+
+template <typename T>
+constexpr inline
+typename std::enable_if_t<(std::is_unsigned_v<T> || std::is_signed_v<T>), bool>
+qMulOverflowGeneric(T v1, T v2, T *r)
+{
+    // This function is a generic fallback for qMulOverflow,
+    // called either by constant or non-constant evaluation,
+    // if the compiler does not have builtins or intrinsics itself.
+    //
+    // (For instance, this is never going to be called on GCC or recent
+    // Clang, as their builtins will be used in all cases.)
+    //
+    // If a compiler does have builtins, please amend qMulOverflow
+    // directly.
+
+    if constexpr (HasLargerInt<T>) {
+        // Use the next biggest type if available
+        using LargerInt = QIntegerForSize<sizeof(T) * 2>;
+        using Larger = typename std::conditional_t<std::is_signed_v<T>,
+                typename LargerInt::Signed, typename LargerInt::Unsigned>;
+        Larger lr = Larger(v1) * Larger(v2);
+        *r = T(lr);
+        return lr > (std::numeric_limits<T>::max)() || lr < (std::numeric_limits<T>::min)();
+    } else {
+        // Otherwise fall back to a wide multiplication
+        return qMulOverflowWideMultiplication(v1, v2, r);
+    }
+}
+} // namespace QtPrivate
+
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_unsigned_v<T>, bool>
 qAddOverflow(T v1, T v2, T *r)
 {
+#if defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+    return __builtin_add_overflow(v1, v2, r);
+#else
+    if (q20::is_constant_evaluated())
+        return QtPrivate::qAddOverflowGeneric(v1, v2, r);
+# if defined(Q_HAVE_ADDCARRY)
+    // We can use intrinsics for the unsigned operations with MSVC
+    if constexpr (std::is_same_v<T, unsigned>) {
+        return _addcarry_u32(0, v1, v2, r);
+    } else if constexpr (std::is_same_v<T, quint64>) {
+#    if defined(Q_PROCESSOR_X86_64)
+        return _addcarry_u64(0, v1, v2, reinterpret_cast<unsigned __int64 *>(r));
+#    else
+        uint low, high;
+        uchar carry = _addcarry_u32(0, unsigned(v1), unsigned(v2), &low);
+        carry = _addcarry_u32(carry, v1 >> 32, v2 >> 32, &high);
+        *r = (quint64(high) << 32) | low;
+        return carry;
+#    endif // defined(Q_PROCESSOR_X86_64)
+    }
+# endif // defined(Q_HAVE_ADDCARRY)
+    return QtPrivate::qAddOverflowGeneric(v1, v2, r);
+#endif // defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+}
+
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_signed_v<T>, bool>
+qAddOverflow(T v1, T v2, T *r)
+{
+#if defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+    return __builtin_add_overflow(v1, v2, r);
+#else
    // Here's how we calculate the overflow:
    // 1) unsigned addition is well-defined, so we can always execute it
    // 2) conversion from unsigned back to signed is implementation-
@ -141,34 +306,37 @@ qAddOverflow(T v1, T v2, T *r)
    using U = typename std::make_unsigned_t<T>;
    *r = T(U(v1) + U(v2));

-    // If int is two's complement, assume all integer types are too.
-    if (std::is_same_v<int32_t, int>) {
-        // Two's complement equivalent (generates slightly shorter code):
-        //  x ^ y             is negative if x and y have different signs
-        //  x & y             is negative if x and y are negative
-        // (x ^ z) & (y ^ z)  is negative if x and z have different signs
-        //                    AND y and z have different signs
-        return ((v1 ^ *r) & (v2 ^ *r)) < 0;
-    }
-
-    bool s1 = (v1 < 0);
-    bool s2 = (v2 < 0);
-    bool sr = (*r < 0);
-    return s1 != sr && s2 != sr;
-    // also: return s1 == s2 && s1 != sr;
+    // Two's complement equivalent (generates slightly shorter code):
+    //  x ^ y             is negative if x and y have different signs
+    //  x & y             is negative if x and y are negative
+    // (x ^ z) & (y ^ z)  is negative if x and z have different signs
+    //                    AND y and z have different signs
+    return ((v1 ^ *r) & (v2 ^ *r)) < 0;
+#endif // defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
 }

-template <typename T> inline typename std::enable_if_t<std::is_unsigned_v<T>, bool>
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_unsigned_v<T>, bool>
 qSubOverflow(T v1, T v2, T *r)
 {
+#if defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+    return __builtin_sub_overflow(v1, v2, r);
+#else
    // unsigned subtractions are well-defined
    *r = v1 - v2;
    return v1 < v2;
+#endif // defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
 }

-template <typename T> inline typename std::enable_if_t<std::is_signed_v<T>, bool>
+template <typename T>
+constexpr inline
+typename std::enable_if_t<std::is_signed_v<T>, bool>
 qSubOverflow(T v1, T v2, T *r)
 {
+#if defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+    return __builtin_sub_overflow(v1, v2, r);
+#else
    // See above for explanation. This is the same with some signs reversed.
    // We can't use qAddOverflow(v1, -v2, r) because it would be UB if
    // v2 == std::numeric_limits<T>::min().
@ -176,111 +344,80 @@ qSubOverflow(T v1, T v2, T *r)
    using U = typename std::make_unsigned_t<T>;
    *r = T(U(v1) - U(v2));

-    if (std::is_same_v<int32_t, int>)
-        return ((v1 ^ *r) & (~v2 ^ *r)) < 0;
-
-    bool s1 = (v1 < 0);
-    bool s2 = !(v2 < 0);
-    bool sr = (*r < 0);
-    return s1 != sr && s2 != sr;
-    // also: return s1 == s2 && s1 != sr;
+    return ((v1 ^ *r) & (~v2 ^ *r)) < 0;
+#endif // defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
 }

-template <typename T> inline
+template <typename T>
+constexpr inline
 typename std::enable_if_t<std::is_unsigned_v<T> || std::is_signed_v<T>, bool>
 qMulOverflow(T v1, T v2, T *r)
 {
-    // use the next biggest type
-    // Note: for 64-bit systems where __int128 isn't supported, this will cause an error.
-    using LargerInt = QIntegerForSize<sizeof(T) * 2>;
-    using Larger = typename std::conditional_t<std::is_signed_v<T>,
-            typename LargerInt::Signed, typename LargerInt::Unsigned>;
-    Larger lr = Larger(v1) * Larger(v2);
-    *r = T(lr);
-    return lr > (std::numeric_limits<T>::max)() || lr < (std::numeric_limits<T>::min)();
-}
+#if defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
+# if defined(Q_INTRINSIC_MUL_OVERFLOW64)
+    return __builtin_mul_overflow(v1, v2, r);
+# else
+    if constexpr (sizeof(T) <= 4)
+        return __builtin_mul_overflow(v1, v2, r);
+    else
+        return QtPrivate::qMulOverflowGeneric(v1, v2, r);
+# endif
+#else
+    if (q20::is_constant_evaluated())
+        return QtPrivate::qMulOverflowGeneric(v1, v2, r);

 # if defined(Q_INTRINSIC_MUL_OVERFLOW64)
-template <> inline bool qMulOverflow(quint64 v1, quint64 v2, quint64 *r)
-{
-    *r = v1 * v2;
-    return Q_UMULH(v1, v2);
-}
-template <> inline bool qMulOverflow(qint64 v1, qint64 v2, qint64 *r)
-{
-    // This is slightly more complex than the unsigned case above: the sign bit
-    // of 'low' must be replicated as the entire 'high', so the only valid
-    // values for 'high' are 0 and -1. Use unsigned multiply since it's the same
-    // as signed for the low bits and use a signed right shift to verify that
-    // 'high' is nothing but sign bits that match the sign of 'low'.
+    if constexpr (std::is_unsigned_v<T> && (sizeof(T) == sizeof(quint64))) {
+        // T is 64 bit; either unsigned long long,
+        // or unsigned long on LP64 platforms.
+        *r = v1 * v2;
+        return T(Q_UMULH(v1, v2));
+    } else if constexpr (std::is_signed_v<T> && (sizeof(T) == sizeof(qint64))) {
+        // This is slightly more complex than the unsigned case above: the sign bit
+        // of 'low' must be replicated as the entire 'high', so the only valid
+        // values for 'high' are 0 and -1. Use unsigned multiply since it's the same
+        // as signed for the low bits and use a signed right shift to verify that
+        // 'high' is nothing but sign bits that match the sign of 'low'.

-    qint64 high = Q_SMULH(v1, v2);
-    *r = qint64(quint64(v1) * quint64(v2));
-    return (*r >> 63) != high;
+        qint64 high = Q_SMULH(v1, v2);
+        *r = qint64(quint64(v1) * quint64(v2));
+        return (*r >> 63) != high;
+    }
+# endif // defined(Q_INTRINSIC_MUL_OVERFLOW64)
+
+    return QtPrivate::qMulOverflowGeneric(v1, v2, r);
+#endif // defined(Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS)
 }

-#   if defined(Q_OS_INTEGRITY) && defined(Q_PROCESSOR_ARM_64)
-template <> inline bool qMulOverflow(uint64_t v1, uint64_t v2, uint64_t *r)
-{
-    return qMulOverflow<quint64>(v1,v2,reinterpret_cast<quint64*>(r));
-}
-
-template <> inline bool qMulOverflow(int64_t v1, int64_t v2, int64_t *r)
-{
-    return qMulOverflow<qint64>(v1,v2,reinterpret_cast<qint64*>(r));
-}
-#    endif // OS_INTEGRITY ARM64
-#  endif // Q_INTRINSIC_MUL_OVERFLOW64
-
-#  if defined(Q_HAVE_ADDCARRY) && defined(Q_PROCESSOR_X86)
-// We can use intrinsics for the unsigned operations with MSVC
-template <> inline bool qAddOverflow(unsigned v1, unsigned v2, unsigned *r)
-{ return _addcarry_u32(0, v1, v2, r); }
-
-// 32-bit qMulOverflow is fine with the generic code above
-
-template <> inline bool qAddOverflow(quint64 v1, quint64 v2, quint64 *r)
-{
-#    if defined(Q_PROCESSOR_X86_64)
-    return _addcarry_u64(0, v1, v2, reinterpret_cast<unsigned __int64 *>(r));
-#    else
-    uint low, high;
-    uchar carry = _addcarry_u32(0, unsigned(v1), unsigned(v2), &low);
-    carry = _addcarry_u32(carry, v1 >> 32, v2 >> 32, &high);
-    *r = (quint64(high) << 32) | low;
-    return carry;
-#    endif // !x86-64
-}
-#  endif // HAVE ADDCARRY
 #undef Q_HAVE_ADDCARRY
-#endif // !GCC
+#undef Q_NUMERIC_USE_GCC_OVERFLOW_BUILTINS

 // Implementations for addition, subtraction or multiplication by a
 // compile-time constant. For addition and subtraction, we simply call the code
 // that detects overflow at runtime. For multiplication, we compare to the
 // maximum possible values before multiplying to ensure no overflow happens.

-template <typename T, T V2> bool qAddOverflow(T v1, std::integral_constant<T, V2>, T *r)
+template <typename T, T V2> constexpr bool qAddOverflow(T v1, std::integral_constant<T, V2>, T *r)
 {
    return qAddOverflow(v1, V2, r);
 }

-template <auto V2, typename T> bool qAddOverflow(T v1, T *r)
+template <auto V2, typename T> constexpr bool qAddOverflow(T v1, T *r)
 {
    return qAddOverflow(v1, std::integral_constant<T, V2>{}, r);
 }

-template <typename T, T V2> bool qSubOverflow(T v1, std::integral_constant<T, V2>, T *r)
+template <typename T, T V2> constexpr bool qSubOverflow(T v1, std::integral_constant<T, V2>, T *r)
 {
    return qSubOverflow(v1, V2, r);
 }

-template <auto V2, typename T> bool qSubOverflow(T v1, T *r)
+template <auto V2, typename T> constexpr bool qSubOverflow(T v1, T *r)
 {
    return qSubOverflow(v1, std::integral_constant<T, V2>{}, r);
 }

-template <typename T, T V2> bool qMulOverflow(T v1, std::integral_constant<T, V2>, T *r)
+template <typename T, T V2> constexpr bool qMulOverflow(T v1, std::integral_constant<T, V2>, T *r)
 {
    // Runtime detection for anything smaller than or equal to a register
    // width, as most architectures' multiplication instructions actually
@ -327,7 +464,7 @@ template <typename T, T V2> bool qMulOverflow(T v1, std::integral_constant<T, V2
    }
 }

-template <auto V2, typename T> bool qMulOverflow(T v1, T *r)
+template <auto V2, typename T> constexpr bool qMulOverflow(T v1, T *r)
 {
    if constexpr (V2 == 2)
        return qAddOverflow(v1, v1, r);
--- a/tests/auto/corelib/global/qnumeric/tst_qnumeric.cpp
+++ b/tests/auto/corelib/global/qnumeric/tst_qnumeric.cpp
@ -13,6 +13,10 @@

 #include <QtCore/q26numeric.h>

+#if QT_POINTER_SIZE == 8 || defined(Q_INTRINSIC_MUL_OVERFLOW64)
+# define QT_HAS_128_BIT_MULTIPLICATION
+#endif
+
 namespace {
    template <typename F> struct Fuzzy {};
    /* Data taken from qglobal.h's implementation of qFuzzyCompare:
@ -82,6 +86,7 @@ private slots:
    void mulOverflow_data();
    void mulOverflow();
    void signedOverflow();
+    void genericWideMultiplication();
 };

 // Floating-point tests:
@ -717,7 +722,7 @@ void tst_QNumeric::mulOverflow()
    if (size == -32)
        MulOverflowDispatch<qint32>()();
    if (size == -64) {
-#if QT_POINTER_SIZE == 8 || defined(Q_INTRINSIC_MUL_OVERFLOW64)
+#ifdef QT_HAS_128_BIT_MULTIPLICATION
        MulOverflowDispatch<qint64>()();
 #else
        QFAIL("128-bit multiplication not supported on this platform");
@ -755,6 +760,292 @@ void tst_QNumeric::signedOverflow()
    QCOMPARE(qMulOverflow(maxInt, maxInt, &r), true);
 }

+template <typename T>
+static bool genericWideMultiplication_impl_impl(T lhs, T rhs)
+{
+    T expectedResult;
+    bool expectedOverflow = qMulOverflow(lhs,
+                                         rhs,
+                                         &expectedResult);
+
+    T actualResult;
+    bool actualOverflow = QtPrivate::qMulOverflowWideMultiplication(lhs, rhs, &actualResult);
+
+    if (actualResult != expectedResult || actualOverflow != expectedOverflow) {
+        qDebug() << "LHS" << lhs << "RHS" << rhs;
+        qDebug() << "Actual" << actualResult
+                 << "Expected" << expectedResult;
+        qDebug() << "Actual overflow" << actualOverflow
+                 << "Expected overflow" << expectedOverflow;
+        return false;
+    }
+
+    return true;
+}
+
+
+template <typename T>
+static bool genericWideMultiplication_impl()
+{
+    using U = std::make_unsigned_t<T>;
+    constexpr int SIZE_IN_BITS = sizeof(T) * CHAR_BIT;
+    for (int i = 0; i < SIZE_IN_BITS; ++i) {
+        for (int j = 0; j < SIZE_IN_BITS; ++j) {
+            for (int offset = -5; offset <= 5; ++offset) {
+                const U lhs_tmp = (U(1) << i) + offset;
+                const U rhs_tmp = (U(1) << j) + offset;
+
+                const T lhs = T(lhs_tmp);
+                const T rhs = T(rhs_tmp);
+
+                if (!genericWideMultiplication_impl_impl(lhs, rhs))
+                    return false;
+            }
+        }
+    }
+
+    for (int i = -256; i <= 256; ++i) {
+        for (int j = -256; j <= 256; ++j) {
+            if (!genericWideMultiplication_impl_impl(T(i), T(j)))
+                return false;
+        }
+    }
+
+    constexpr T minimal = std::numeric_limits<T>::min();
+    constexpr T maximal = std::numeric_limits<T>::max();
+    for (int i = -5; i <= 5; ++i) {
+        for (T j = minimal; j < minimal + 10; ++j) {
+            if (!genericWideMultiplication_impl_impl(T(i), T(j)))
+                return false;
+        }
+        for (T j = maximal - 10; j < maximal; ++j) {
+            if (!genericWideMultiplication_impl_impl(T(i), T(j)))
+                return false;
+        }
+        if (!genericWideMultiplication_impl_impl(T(i), maximal))
+            return false;
+    }
+
+    return true;
+}
+
+void tst_QNumeric::genericWideMultiplication()
+{
+    QVERIFY(genericWideMultiplication_impl<int>());
+    QVERIFY(genericWideMultiplication_impl<unsigned int>());
+
+#if defined(QT_HAS_128_BIT_MULTIPLICATION)
+    QVERIFY(genericWideMultiplication_impl<long>());
+    QVERIFY(genericWideMultiplication_impl<unsigned long>());
+
+    QVERIFY(genericWideMultiplication_impl<long long>());
+    QVERIFY(genericWideMultiplication_impl<unsigned long long>());
+#endif
+}
+
+
+#if !defined(Q_CC_GHS)
+// Compile-time tests for overflow math
+namespace OverflowTest {
+template <typename T>
+constexpr bool add(T a, T b, bool expOverflow, T exp = {})
+{
+    T result{};
+    const bool overflowed = qAddOverflow(a, b, &result);
+    if (overflowed != expOverflow)
+        return false;
+    if (overflowed)
+        return true;
+    return result == exp;
+}
+
+template <typename T>
+constexpr bool sub(T a, T b, bool expOverflow, T exp = {})
+{
+    T result{};
+    const bool overflowed = qSubOverflow(a, b, &result);
+    if (overflowed != expOverflow)
+        return false;
+    if (overflowed)
+        return true;
+    return result == exp;
+}
+
+template <typename T>
+constexpr bool mul(T a, T b, bool expOverflow, T exp = {})
+{
+    T result{};
+    const bool overflowed = qMulOverflow(a, b, &result);
+    if (overflowed != expOverflow)
+        return false;
+    if (overflowed)
+        return true;
+    return result == exp;
+}
+
+// Addition
+#define ADD_OVERFLOW_COMMON_TEST(type, maximal) \
+    static_assert(add(type(0), type(0), false, type(0))); \
+    static_assert(add(type(0), type(1), false, type(1))); \
+    static_assert(add(type(1), type(1), false, type(2))); \
+    static_assert(add(type(maximal), type(0), false, type(maximal))); \
+    static_assert(add(type(0), type(maximal), false, type(maximal))); \
+    static_assert(add(type(maximal), type(1), true)); \
+    static_assert(add(type(maximal), type(2), true)); \
+    static_assert(add(type(maximal), type(maximal), true)); \
+    static_assert(add(type(type(maximal) / type(2)), \
+                      type(type(maximal) / type(2)), \
+                      false, \
+                      type(type(maximal) - type(1)))); \
+
+#define ADD_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+    ADD_OVERFLOW_COMMON_TEST(type, maximal) \
+
+#define ADD_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+    ADD_OVERFLOW_COMMON_TEST(type, maximal) \
+    static_assert(add(type(0), type(-1), false, type(-1))); \
+    static_assert(add(type(-1), type(-1), false, type(-2))); \
+    static_assert(add(type(-1), type(-1), false, type(-2))); \
+    static_assert(add(type(minimal), type(0), false, type(minimal))); \
+    static_assert(add(type(0), type(minimal), false, type(minimal))); \
+    static_assert(add(type(minimal), type(-1), true)); \
+    static_assert(add(type(minimal), type(minimal), true)); \
+    static_assert(add(type(maximal), type(minimal), false, type(-1))); \
+    static_assert(add(type(minimal), type(maximal), false, type(-1))); \
+    static_assert(add(type(maximal), type(-1), false, type(type(maximal) - type(1)))); \
+
+// Subtraction
+#define SUB_OVERFLOW_COMMON_TEST(type, minimal, maximal) \
+    static_assert(sub(type(minimal), type(0), false, type(minimal))); \
+    static_assert(sub(type(minimal), type(1), true)); \
+    static_assert(sub(type(minimal), type(2), true)); \
+    static_assert(sub(type(minimal), type(maximal), true)); \
+    static_assert(sub(type(0), type(0), false, type(0))); \
+    static_assert(sub(type(1), type(0), false, type(1))); \
+    static_assert(sub(type(3), type(0), false, type(3))); \
+    static_assert(sub(type(9), type(5), false, type(4))); \
+    static_assert(sub(type(9), type(9), false, type(0))); \
+    static_assert(sub(type(maximal), type(0), false, type(maximal))); \
+    static_assert(sub(type(maximal), type(maximal), false, type(0))); \
+
+#define SUB_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+    SUB_OVERFLOW_COMMON_TEST(type, 0, maximal) \
+    static_assert(sub(type(1), type(2), true)); \
+    static_assert(sub(type(0), type(maximal), true)); \
+    static_assert(sub(type(1), type(maximal), true)); \
+
+#define SUB_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+    SUB_OVERFLOW_COMMON_TEST(type, minimal, maximal) \
+    static_assert(sub(type(1), type(2), false, type(-1))); \
+    static_assert(sub(type(0), type(maximal), false, type(type(minimal) + type(1)))); \
+    static_assert(sub(type(1), type(maximal), false, type(type(minimal) + type(2)))); \
+    static_assert(sub(type(0), type(minimal), true)); \
+    static_assert(sub(type(maximal), type(-1), true)); \
+    static_assert(sub(type(maximal), type(minimal), true)); \
+
+// Multiplication
+#define MUL_OVERFLOW_COMMON_TEST(type, minimal, maximal) \
+    static_assert(mul(type(0), type(0), false, type(0))); \
+    static_assert(mul(type(1), type(0), false, type(0))); \
+    static_assert(mul(type(0), type(1), false, type(0))); \
+    static_assert(mul(type(1), type(1), false, type(1))); \
+    static_assert(mul(type(1), type(10), false, type(10))); \
+    static_assert(mul(type(5), type(2), false, type(10))); \
+    static_assert(mul(type(minimal), type(0), false, type(0))); \
+    static_assert(mul(type(maximal), type(0), false, type(0))); \
+    static_assert(mul(type(0), type(minimal), false, type(0))); \
+    static_assert(mul(type(0), type(maximal), false, type(0))); \
+    static_assert(mul(type(minimal), type(1), false, type(minimal))); \
+    static_assert(mul(type(maximal), type(1), false, type(maximal))); \
+    static_assert(mul(type(1), type(minimal), false, type(minimal))); \
+    static_assert(mul(type(1), type(maximal), false, type(maximal))); \
+    static_assert(mul(type(maximal), type(2), true)); \
+    static_assert(mul(type(maximal/2), type(2), false, type(type(maximal) - type(1)))); \
+    static_assert(mul(type(maximal/4), type(4), false, type(type(maximal) - type(3)))); \
+
+#define MUL_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+    MUL_OVERFLOW_COMMON_TEST(type, 0, maximal) \
+
+#define MUL_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+    MUL_OVERFLOW_COMMON_TEST(type, minimal, maximal) \
+    static_assert(mul(type(maximal), type(-1), false, type(-maximal))); \
+    static_assert(mul(type(minimal), type(-1), true)); \
+    static_assert(mul(type(minimal), type(2), true)); \
+    static_assert(mul(type(minimal/2), type(3), true)); \
+
+
+#define UNSIGNED_TYPE_TEST(type, maximal) \
+    ADD_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+    SUB_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+    MUL_OVERFLOW_UNSIGNED_TYPE_TEST(type, maximal) \
+
+#define SIGNED_TYPE_TEST(type, minimal, maximal) \
+    ADD_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+    SUB_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+    MUL_OVERFLOW_SIGNED_TYPE_TEST(type, minimal, maximal) \
+
+using schar = signed char;
+using uchar = unsigned char;
+using ushort = unsigned short;
+using uint = unsigned int;
+using ulong = unsigned long;
+
+#if CHAR_MAX == 127 // char is signed
+SIGNED_TYPE_TEST(char, CHAR_MIN, CHAR_MAX)
+#else
+UNSIGNED_TYPE_TEST(char, CHAR_MAX)
+#endif
+
+SIGNED_TYPE_TEST(schar, SCHAR_MIN, SCHAR_MAX)
+UNSIGNED_TYPE_TEST(uchar, UCHAR_MAX)
+
+SIGNED_TYPE_TEST(short, SHRT_MIN, SHRT_MAX)
+UNSIGNED_TYPE_TEST(ushort, USHRT_MAX)
+
+SIGNED_TYPE_TEST(int, INT_MIN, INT_MAX)
+UNSIGNED_TYPE_TEST(uint, UINT_MAX)
+
+#if LONG_MAX == 2147483647 || defined(QT_HAS_128_BIT_MULTIPLICATION)
+SIGNED_TYPE_TEST(long, LONG_MIN, LONG_MAX)
+UNSIGNED_TYPE_TEST(ulong, ULONG_MAX)
+#else
+ADD_OVERFLOW_SIGNED_TYPE_TEST(long, LONG_MIN, LONG_MAX)
+SUB_OVERFLOW_SIGNED_TYPE_TEST(long, LONG_MIN, LONG_MAX)
+ADD_OVERFLOW_UNSIGNED_TYPE_TEST(ulong, ULONG_MAX)
+SUB_OVERFLOW_UNSIGNED_TYPE_TEST(ulong, ULONG_MAX)
+#endif
+
+#if defined(QT_HAS_128_BIT_MULTIPLICATION)
+// Compiling this causes an ICE in MSVC, so skipping it
+#if !defined(Q_CC_MSVC) || Q_CC_MSVC > 1942
+SIGNED_TYPE_TEST(qlonglong, LLONG_MIN, LLONG_MAX)
+UNSIGNED_TYPE_TEST(qulonglong, ULLONG_MAX)
+#endif
+#else
+ADD_OVERFLOW_SIGNED_TYPE_TEST(qlonglong, LLONG_MIN, LLONG_MAX)
+SUB_OVERFLOW_SIGNED_TYPE_TEST(qlonglong, LLONG_MIN, LLONG_MAX)
+ADD_OVERFLOW_UNSIGNED_TYPE_TEST(qlonglong, ULLONG_MAX)
+SUB_OVERFLOW_UNSIGNED_TYPE_TEST(qlonglong, ULLONG_MAX)
+#endif
+
+#undef ADD_OVERFLOW_COMMON_TEST
+#undef ADD_OVERFLOW_UNSIGNED_TYPE_TEST
+#undef ADD_OVERFLOW_SIGNED_TYPE_TEST
+
+#undef SUB_OVERFLOW_COMMON_TEST
+#undef SUB_OVERFLOW_UNSIGNED_TYPE_TEST
+#undef SUB_OVERFLOW_SIGNED_TYPE_TEST
+
+#undef MUL_OVERFLOW_COMMON_TEST
+#undef MUL_OVERFLOW_UNSIGNED_TYPE_TEST
+#undef MUL_OVERFLOW_SIGNED_TYPE_TEST
+
+#undef UNSIGNED_TYPE_TEST
+#undef SIGNED_TYPE_TEST
+} // namespace OverflowTest
+
+#endif // !defined(Q_CC_GHS)
+
 namespace SaturateCastTest {

 template <typename T> static constexpr T max = std::numeric_limits<T>::max();