From 76d2846a71a155ee2861fd52e6635e35490a9dd1 Mon Sep 17 00:00:00 2001 From: Krunal Bauskar Date: Tue, 30 Mar 2021 15:57:14 +0800 Subject: [PATCH] =?UTF-8?q?MDEV-24630:=20MY=5FRELAX=5FCPU=20assembly=20ins?= =?UTF-8?q?truction=20upgrade/research=20for=20=C2=A0=20=C2=A0=20=C2=A0=20?= =?UTF-8?q?=C2=A0=20=C2=A0=20=C2=A0=20memory=20barrier=20on=20ARM?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As suggested in the said JIRA ticket based on the contribution done by the community (in an attempt to optimize the spin-loop) the said approach was evaluated against MariaDB Server 10.5 and found to help improve throughput in the range of 2-5%. Note: 10.6 timing graph and model are different as home-brew mutexes are replaced with pthread mutexes. Said patch has mixed impact on 10.6 so not recommended for 10.6. --- include/my_cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/my_cpu.h b/include/my_cpu.h index e536ff285f9..b42d62e7e82 100644 --- a/include/my_cpu.h +++ b/include/my_cpu.h @@ -84,7 +84,7 @@ static inline void MY_RELAX_CPU(void) __ppc_get_timebase(); #elif defined __GNUC__ && (defined __arm__ || defined __aarch64__) /* Mainly, prevent the compiler from optimizing away delay loops */ - __asm__ __volatile__ ("":::"memory"); + __asm__ __volatile__ ("isb":::"memory"); #else int32 var, oldval = 0; my_atomic_cas32_strong_explicit(&var, &oldval, 1, MY_MEMORY_ORDER_RELAXED,