Milian Wolff e95fb04202 QSharedPointer: optimize casts on rvalue shared pointers
When we are casting an rvalue QSharedPointer, we do not need to
pay the cost for the atomic refcount increment / decrement. Optimize
this by adding rvalue overloads that handle this specific case
directly.

Note that this is arguably a micro optimization since in most cases
the cost to create the pointer in the first place is going to dwarf
the cost for the atomic increment / decrement. But it starts to matter
for situations like `someConstObject.ptrGetter().dynamicCast()` - in
the common case the `ptrGetter()` returns by value and the cast can
then operate on an rvalue.

On my system, the benchmark speaks for itself:

```
./tests/benchmarks/corelib/tools/qsharedpointer/tst_bench_shared_ptr -perf -perfcounter cycles,instructions -iterations 100000 objectCast objectCast_rvalue
********* Start testing of tst_QSharedPointer *********
Config: Using QtTest library 6.9.0, Qt 6.9.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 14.2.1 20240805), arch unknown
PASS   : tst_QSharedPointer::initTestCase()
PASS   : tst_QSharedPointer::objectCast()
RESULT : tst_QSharedPointer::objectCast():
     147.05521 CPU cycles per iteration (total: 14,705,522, iterations: 100000)
     147.00058 instructions per iteration, 1.000 instr/cycle (total: 14,700,058, iterations: 100000)
PASS   : tst_QSharedPointer::objectCast_rvalue()
RESULT : tst_QSharedPointer::objectCast_rvalue():
     52.00227 CPU cycles per iteration (total: 5,200,227, iterations: 100000)
     110.00056 instructions per iteration, 2.115 instr/cycle (total: 11,000,057, iterations: 100000)
PASS   : tst_QSharedPointer::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped, 0 blacklisted, 45ms
********* Finished testing of tst_QSharedPointer *********

./tests/benchmarks/corelib/tools/qsharedpointer/tst_bench_shared_ptr -perf -perfcounter cycles,instructions -iterations 100000 dynamicCast dynamicCast_rvalue
********* Start testing of tst_QSharedPointer *********
Config: Using QtTest library 6.9.0, Qt 6.9.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 14.2.1 20240802), arch unknown
PASS   : tst_QSharedPointer::initTestCase()
PASS   : tst_QSharedPointer::dynamicCast()
RESULT : tst_QSharedPointer::dynamicCast():
     148.34457 CPU cycles per iteration (total: 14,834,457, iterations: 100000)
     120.00057 instructions per iteration, 0.809 instr/cycle (total: 12,000,058, iterations: 100000)
PASS   : tst_QSharedPointer::dynamicCast_rvalue()
RESULT : tst_QSharedPointer::dynamicCast_rvalue():
     25.00210 CPU cycles per iteration (total: 2,500,211, iterations: 100000)
     81.00057 instructions per iteration, 3.240 instr/cycle (total: 8,100,058, iterations: 100000)
PASS   : tst_QSharedPointer::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped, 0 blacklisted, 45ms
********* Finished testing of tst_QSharedPointer *********

./tests/benchmarks/corelib/tools/qsharedpointer/tst_bench_shared_ptr -perf -perfcounter cycles,instructions -iterations 100000 staticCast staticCast_rvalue
********* Start testing of tst_QSharedPointer *********
Config: Using QtTest library 6.9.0, Qt 6.9.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 14.2.1 20240802), arch unknown
PASS   : tst_QSharedPointer::initTestCase()
PASS   : tst_QSharedPointer::staticCast()
RESULT : tst_QSharedPointer::staticCast():
     142.95894 CPU cycles per iteration (total: 14,295,894, iterations: 100000)
     54.00057 instructions per iteration, 0.378 instr/cycle (total: 5,400,058, iterations: 100000)
PASS   : tst_QSharedPointer::staticCast_rvalue()
RESULT : tst_QSharedPointer::staticCast_rvalue():
     14.00205 CPU cycles per iteration (total: 1,400,205, iterations: 100000)
     22.00056 instructions per iteration, 1.571 instr/cycle (total: 2,200,057, iterations: 100000)
PASS   : tst_QSharedPointer::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped, 0 blacklisted, 50ms
********* Finished testing of tst_QSharedPointer *********

./tests/benchmarks/corelib/tools/qsharedpointer/tst_bench_shared_ptr -perf -perfcounter cycles,instructions -iterations 100000 constCast constCast_rvalue
********* Start testing of tst_QSharedPointer *********
Config: Using QtTest library 6.9.0, Qt 6.9.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 14.2.1 20240802), arch unknown
PASS   : tst_QSharedPointer::initTestCase()
PASS   : tst_QSharedPointer::constCast()
RESULT : tst_QSharedPointer::constCast():
     142.38115 CPU cycles per iteration (total: 14,238,116, iterations: 100000)
     54.00057 instructions per iteration, 0.379 instr/cycle (total: 5,400,058, iterations: 100000)
PASS   : tst_QSharedPointer::constCast_rvalue()
RESULT : tst_QSharedPointer::constCast_rvalue():
     13.00243 CPU cycles per iteration (total: 1,300,243, iterations: 100000)
     22.00057 instructions per iteration, 1.692 instr/cycle (total: 2,200,058, iterations: 100000)
PASS   : tst_QSharedPointer::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped, 0 blacklisted, 42ms
********* Finished testing of tst_QSharedPointer *********
```

[ChangeLog][QtCore][QSharedPointer] Optimized casts on rvalue shared
pointers.

Change-Id: I7dfb4d92253d6c60286d3903bc7aef66acab5689
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-09-05 12:44:46 +02:00
..
2024-02-04 09:56:42 +01:00
2024-02-04 09:56:42 +01:00
2024-02-04 09:56:42 +01:00
2024-02-04 09:56:42 +01:00
2024-02-04 09:56:42 +01:00