Add explicit compiler fence when pushing frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

**Motivation:**

The `vm_push_frame` was specifically tweaked in
https://github.com/ruby/ruby/pull/3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
This commit is contained in:
Ivo Anjo 2024-06-21 11:48:37 +01:00 committed by Koichi Sasada
parent 4d4ac00123
commit 64fef3b870
2 changed files with 14 additions and 0 deletions

View File

@ -1440,6 +1440,7 @@ AC_CHECK_HEADERS(utime.h)
AC_CHECK_HEADERS(sys/epoll.h)
AC_CHECK_HEADERS(sys/event.h)
AC_CHECK_HEADERS(stdckdint.h)
AC_CHECK_HEADERS(stdatomic.h)
AS_CASE("$target_cpu", [x64|x86_64|i[3-6]86*], [
AC_CHECK_HEADERS(x86intrin.h)
@ -2121,6 +2122,7 @@ AC_CHECK_FUNCS(_longjmp) # used for AC_ARG_WITH(setjmp-type)
test x$ac_cv_func__longjmp = xno && ac_cv_func__setjmp=no
AC_CHECK_FUNCS(arc4random_buf)
AC_CHECK_FUNCS(atan2l atan2f)
AC_CHECK_DECLS(atomic_signal_fence, [], [], [#include <stdatomic.h>])
AC_CHECK_FUNCS(chmod)
AC_CHECK_FUNCS(chown)
AC_CHECK_FUNCS(chroot)

View File

@ -12,6 +12,10 @@
#include <math.h>
#ifdef HAVE_STDATOMIC_H
#include <stdatomic.h>
#endif
#include "constant.h"
#include "debug_counter.h"
#include "internal.h"
@ -415,6 +419,14 @@ vm_push_frame(rb_execution_context_t *ec,
.jit_return = NULL
};
/* Ensure the initialization of `*cfp` above never gets reordered with the update of `ec->cfp` below.
This is a no-op in all cases we've looked at (https://godbolt.org/z/3oxd1446K), but should guarantee it for all
future/untested compilers/platforms. */
#ifdef HAVE_DECL_ATOMIC_SIGNAL_FENCE
atomic_signal_fence(memory_order_seq_cst);
#endif
ec->cfp = cfp;
if (VMDEBUG == 2) {