MDEV-23369 False sharing in page_hash_latch::read_lock_wait()

MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple
rw-lock implementation that avoids a spinloop between non-contended
read-lock requests, simply using std::atomic::fetch_add() for the
lock acquisition.

Alas, in a write-heavy stress test on a 56-core system with 1,000
concurrent client connections, the server would stop processing
any transactions every now and then. The reason turned out to be
false sharing. Attaching a debugger to the server during one such
hang revealed that 22 of the 1,033 threads were polling in
page_hash_latch::read_lock_wait() on the same object, which appeared
to be in unlocked state (no readers or writers). All 22 requests were
for accessing an undo log page, with a distinct page number.

To eliminate such false sharing, we will make buf_pool.page_hash.array
contain one page_hash_latch per CPU data cache line. On AMD64, this
will pad the size of the array by 8/7, or almost 15%. For a 50GiB
buffer pool of 16KiB pages, the buf_pool.page_hash.array would
grow from 25MiB to 28.6MiB. On other instruction set architectures,
the incurred memory overhead may be smaller.

Thanks to Vladislav Vaintroub for noticing this anomaly.
This commit is contained in:
Marko Mäkelä 2020-08-02 20:39:36 +03:00
parent 8ddebb33c2
commit c12d24e291

View File

@ -1824,7 +1824,8 @@ public:
{
/** Number of array[] elements per page_hash_latch.
Must be one less than a power of 2. */
static constexpr size_t ELEMENTS_PER_LATCH= 1023;
static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /
sizeof(void*) - 1;
/** number of payload elements in array[] */
Atomic_relaxed<ulint> n_cells;