This commit allows building YJIT and ZJIT simultaneously, a "combo
build". Previously, `./configure --enable-yjit --enable-zjit` failed. At
runtime, though, only one of the two can be enabled at a time.
Add a root Cargo workspace that contains both the yjit and zjit crate.
The common Rust build integration mechanisms are factored out into
defs/jit.mk.
Combo YJIT+ZJIT dev builds are supported; if either JIT uses
`--enable-*=dev`, both of them are built in dev mode.
The combo build requires Cargo, but building one JIT at a time with only
rustc in release build remains supported.
Rework ractors so that any ractor action (Ractor.receive, Ractor#send, Ractor.yield, Ractor#take,
Ractor.select) will operate on the thread that called the action. It will put that thread to sleep if
it's a blocking function and it needs to put it to sleep, and the awakening action (Ractor.yield,
Ractor#send) will wake up the blocked thread.
Before this change every blocking ractor action was associated with the ractor struct and its fields.
If a ractor called Ractor.receive, its wait status was wait_receiving, and when another ractor calls
r.send on it, it will look for that status in the ractor struct fields and wake it up. The problem was that
what if 2 threads call blocking ractor actions in the same ractor. Imagine if 1 thread has called Ractor.receive
and another r.take. Then, when a different ractor calls r.send on it, it doesn't know which ruby thread is associated
to which ractor action, so what ruby thread should it schedule? This change moves some fields onto the ruby thread
itself so that ruby threads are the ones that have ractor blocking statuses, and threads are then specifically scheduled
when unblocked.
Fixes [#17624]
Fixes [#21037]
Ivars will longer be the only thing stored inline
via shapes, so keeping the `iv_index` and `ivptr` names
would be confusing.
Instance variables won't be the only thing stored inline
via shapes, so keeping the `ivptr` name would be confusing.
`field` encompass anything that can be stored in a VALUE array.
Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.
* Add --zjit-profile-interval option
* Fix min to max
* Avoid rewriting instructions for --zjit-call-threshold=1
* Rename the option to --zjit-num-profiles
* Implement FixnumAdd and stub PatchPoint/GuardType
Co-authored-by: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
* Clone Target for arm64
* Use $create instead of use create
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Fix misindentation from suggested changes
* Drop an unneeded variable for mut
* Load operand into a register only if necessary
---------
Co-authored-by: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Add zjit_* instructions to profile the interpreter
* Rename FixnumPlus to FixnumAdd
* Update a comment about Invalidate
* Rename Guard to GuardType
* Rename Invalidate to PatchPoint
* Drop unneeded debug!()
* Plan on profiling the types
* Use the output of GuardType as type refined outputs
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.
As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.
I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.
My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:
* We only need a hash set rather than a hash table (we only need keys,
not values), and so the full entry can be written as a single VALUE
* As a set we only need lookup/insert/delete, no update
* Delete is only run inside GC so does not need to be atomic (It could
be made concurrent)
* I use rb_vm_barrier for the (rare) table rebuilds (It could be made
concurrent) We VM lock (but don't require other threads to stop) for
table rebuilds, as those are rare
* The conservative garbage collector makes deferred replication easy,
using a T_DATA object
Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.
This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.
Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
It looks like stat_insn_usage was introduced with YARV, but as far as I
can tell the field has never been used. I think we should remove the
field since we don't use it.
Previously, vm_make_env_each() (used during proc
creation and for the debug inspector C API) picked up the
non-GC-allocated iseq that rb_vm_push_frame_fname() creates,
which led to a SEGV when the GC tried to mark the non GC object.
Put a real iseq imemo instead. Speed should be about the same since
the old code also did a imemo allocation and a malloc allocation.
Real iseq allows ironing out the special-casing of dummy frames in
rb_execution_context_mark() and rb_execution_context_update(). A check
is added to RubyVM::ISeq#eval, though, to stop attempts to run dummy
iseqs.
[Bug #21180]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
We can just always return the jit_entry since it will be initialized to
NULL. There is no reason to specifically return NULL if yjit / rjit are
disabled
We shouldn't directly set the flags of an object because there could be
other flags set that would be erased. Instead, we can unset T_MASK and
set T_ICLASS isntead.
[Bug #20950]
ifunc proc has the ep allocated in the cfunc_proc_t which is the data of
the TypedData object. If an ifunc proc is duplicated, the ep points to
the ep of the source object. If the source object is freed, then the ep
of the duplicated object now points to a freed memory region. If we try
to use the ep we could crash.
For example, the following script crashes:
p = { a: 1 }.to_proc
100.times do
p = p.dup
GC.start
p.call
rescue ArgumentError
end
This commit changes ifunc proc to also duplicate the ep when it is duplicated.
The macro provided by symbol.h uses STATIC_ID2SYM
when it can which speeds up methods that declare keyword args.
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Add opt_duparray_send insn to skip the allocation on `#include?`
If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.
This adds a BOP for include? and tracks redefinition for it on Array.
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* YJIT: Implement opt_duparray_send include_p
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* Update opt_newarray_send to support simple forms of include?(arg)
Similar to opt_duparray_send but for non-static arrays.
* YJIT: Implement opt_newarray_send include_p
---------
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
When we run with RUBY_FREE_AT_EXIT, there's a false-positive memory leak
reported in YJIT because the METHOD_CODEGEN_TABLE is never freed. This
commit adds rb_yjit_free_at_exit that is called at shutdown when
RUBY_FREE_AT_EXIT is set.
Reported memory leak:
==699816== 1,104 bytes in 1 blocks are possibly lost in loss record 1 of 1
==699816== at 0x484680F: malloc (vg_replace_malloc.c:446)
==699816== by 0x155B3E: UnknownInlinedFun (unix.rs:14)
==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:36)
==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:27)
==699816== by 0x155B3E: alloc (alloc.rs:98)
==699816== by 0x155B3E: alloc_impl (alloc.rs:181)
==699816== by 0x155B3E: allocate (alloc.rs:241)
==699816== by 0x155B3E: do_alloc<alloc::alloc::Global> (alloc.rs:15)
==699816== by 0x155B3E: new_uninitialized<alloc::alloc::Global> (mod.rs:1750)
==699816== by 0x155B3E: fallible_with_capacity<alloc::alloc::Global> (mod.rs:1788)
==699816== by 0x155B3E: prepare_resize<alloc::alloc::Global> (mod.rs:2864)
==699816== by 0x155B3E: resize_inner<alloc::alloc::Global> (mod.rs:3060)
==699816== by 0x155B3E: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2950)
==699816== by 0x155B3E: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1231)
==699816== by 0x5BC39F: UnknownInlinedFun (mod.rs:1179)
==699816== by 0x5BC39F: find_or_find_insert_slot<(usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool), alloc::alloc::Global, hashbrown::map::equivalent_key::{closure_env#0}<usize, usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool>, hashbrown::map::make_hasher::{closure_env#0}<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState>> (mod.rs:1413)
==699816== by 0x5BC39F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==699816== by 0x57C5C6: insert<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState> (map.rs:1104)
==699816== by 0x57C5C6: yjit::codegen::reg_method_codegen (codegen.rs:10521)
==699816== by 0x57C295: yjit::codegen::yjit_reg_method_codegen_fns (codegen.rs:10464)
==699816== by 0x5C6B07: rb_yjit_init (yjit.rs:40)
==699816== by 0x393723: ruby_opt_init (ruby.c:1820)
==699816== by 0x393723: ruby_opt_init (ruby.c:1767)
==699816== by 0x3957D4: prism_script (ruby.c:2215)
==699816== by 0x3957D4: process_options (ruby.c:2538)
==699816== by 0x396065: ruby_process_options (ruby.c:3166)
==699816== by 0x236E56: ruby_options (eval.c:117)
==699816== by 0x15BAED: rb_main (main.c:43)
==699816== by 0x15BAED: main (main.c:62)
After this patch, there are no more memory leaks reported when running
RUBY_FREE_AT_EXIT with Valgrind on an empty Ruby script:
$ RUBY_FREE_AT_EXIT=1 valgrind --leak-check=full ruby -e ""
...
==700357== HEAP SUMMARY:
==700357== in use at exit: 0 bytes in 0 blocks
==700357== total heap usage: 36,559 allocs, 36,559 frees, 6,064,783 bytes allocated
==700357==
==700357== All heap blocks were freed -- no leaks are possible
Many libraries should be loaded on the main ractor because of
setting constants with unshareable objects and so on.
This patch allows to call `requore` on non-main Ractors by
asking the main ractor to call `require` on it. The calling ractor
waits for the result of `require` from the main ractor.
If the `require` call failed with some reasons, an exception
objects will be deliverred from the main ractor to the calling ractor
if it is copy-able.
Same on `require_relative` and `require` by `autoload`.
Now `Ractor.new{pp obj}` works well (the first call of `pp` requires
`pp` library implicitly).
[Feature #20627]
to show unused block warning strictly.
```ruby
class C
def f = nil
end
class D
def f = yield
end
[C.new, D.new].each{|obj| obj.f{}}
```
In this case, `D#f` accepts a block. However `C#f` doesn't
accept a block. There are some cases passing a block with
`obj.f{}` where `obj` is `C` or `D`. To avoid warnings on
such cases, "unused block warning" will be warned only if
there is not same name which accepts a block.
On the above example, `C.new.f{}` doesn't show any warnings
because there is a same name `D#f` which accepts a block.
We call this default behavior as "relax mode".
`strict_unused_block` new warning category changes from
"relax mode" to "strict mode", we don't check same name
methods and `C.new.f{}` will be warned.
[Feature #15554]
* YJIT: Replace Array#each only when YJIT is enabled
* Add comments about BUILTIN_ATTR_C_TRACE
* Make Ruby Array#each available with --yjit as well
* Fix all paths that expect a C location
* Use method_basic_definition_p to detect patches
* Copy a comment about C_TRACE flag to compilers
* Rephrase a comment about add_yjit_hook
* Give METHOD_ENTRY_BASIC flag to Array#each
* Add --yjit-c-builtin option
* Allow inconsistent source_location in test-spec
* Refactor a check of BUILTIN_ATTR_C_TRACE
* Set METHOD_ENTRY_BASIC without touching vm->running
If a Hash which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_hash_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
If an Array which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_ary_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>