2668 Commits

Author SHA1 Message Date
Peter Zhu
be5450467b Fix reference updating for id2ref table
The id2ref table could contain dead entries which should not be passed
into rb_gc_location. Also, we already update references in gc_update_references
using rb_gc_vm_weak_table_foreach so we do not need to update it again.
2025-05-27 08:22:26 +02:00
John Hawthorn
f483befd90 Add shape_id to RBasic under 32 bit
This makes `RBobject` `4B` larger on 32 bit systems
but simplifies the implementation a lot.

[Feature #21353]

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2025-05-26 10:31:54 +02:00
Nobuyoshi Nakada
aad9fa2853
Use RB_VM_LOCKING 2025-05-25 15:22:43 +09:00
John Hawthorn
11ad7f5f47 Don't use namespaced classext for superclasses
Superclasses can't be modified by user code, so do not need namespace
indirection. For example Object.superclass is always BasicObject, no
matter what modules are included onto it.
2025-05-23 10:22:24 -07:00
Nobuyoshi Nakada
7154b4208b
Fix a -Wmaybe-uninitialized
lev in rb_gc_vm_lock() is uninitialized in single ractor mode.
2025-05-22 10:55:19 +09:00
John Hawthorn
6a16c3e26d Remove too_complex GC assertion
Classes from the default namespace are not writable, however they do not
transition to too_complex until they have been written to inside a user
namespace. So this assertion is invalid (as is the previous location it
was) but it doesn't seem to provide us much value.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2025-05-21 17:23:18 -07:00
Aaron Patterson
6ea893f376 Add assertion for RCLASS_SET_PRIME_CLASSEXT_WRITABLE
When classes are booted, they should all be writeable unless namespaces
are enabled.  This commit adds an assertion to ensure that classes are
writable.
2025-05-21 09:51:32 -07:00
Peter Zhu
ac23fa0902 Use rb_id_table_foreach_values for mark_cc_tbl
We don't need the key, so we can improve performance by only iterating
on the value.

This will also fix the MMTk build because looking up the key in
rb_id_table_foreach requires locking the VM, which is not supported in
the MMTk worker threads.
2025-05-21 11:27:02 -04:00
Jean Boussier
31ba881684 Disable GC when building id2ref table
Building that table will likely malloc several time which
can trigger GC and cause race condition by freeing objects
that were just added to the table.

Disabling GC to prevent the race condition isn't elegant,
but iven this is a deprecated callpath that is executed at
most once per process, it seems acceptable.
2025-05-15 16:29:45 +02:00
Jean Boussier
60ffb714d2 Ensure shape_id is never used on T_IMEMO
It doesn't make sense to set ivars or anything shape
related on a T_IMEMO.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
2025-05-15 16:06:52 +02:00
Jean Boussier
b5575a80bc Reduce Object#object_id contention.
If the object isn't shareable and already has a object_id
we can access it without a lock.

If we need to generate an ID, we may need to lock to find
the child shape.

We also generate the next `object_id` using atomics.
2025-05-14 14:41:46 +02:00
Jean Boussier
f9c3feccf4 Rename id_to_obj_tbl -> id2ref_tbl
As well as associated functions, this should make it more obvious
what the purpose is.
2025-05-14 11:41:14 +02:00
Jean Boussier
9400119702 Fix object_id for classes and modules in namespace context
Given classes and modules have a different set of fields in every
namespace, we can't store the object_id in fields for them.

Given that some space was freed in `RClass` we can store it there
instead.
2025-05-14 10:26:48 +02:00
Jean Boussier
2ca8769443 Reclaim one VALUE from rb_classext_t
The `includer` field is only used for `T_ICLASS`, so by moving
it into the existing union we can save one `VALUE` per class
and module.
2025-05-13 14:55:39 +02:00
Samuel Williams
425fa0aeb5
Make waiting_fd behaviour per-IO. (#13127)
- `rb_thread_fd_close` is deprecated and now a no-op.
- IO operations (including close) no longer take a vm-wide lock.
2025-05-13 19:02:03 +09:00
Jean Boussier
a6435befa7 variable.c: Refactor rb_obj_field_* to take shape_id_t 2025-05-13 10:35:34 +02:00
Peter Zhu
85d9ebc995 Remove duplicate asan_unpoisoning_object
It's already defined in internal/sanitizers.h.
2025-05-12 10:51:17 -04:00
Jean Boussier
8b7a4d167a Handle GC triggering while building the initial id_to_obj_tbl
GC can trigger while we build the table, and if it sweeps an object
with an ID, it may not find it in the `id_to_obj` table.
2025-05-11 22:05:06 +02:00
Jean Boussier
f2e5f6dbb6 Allow T_CLASS and generic types to be too_complex
The intial complex shape implementation never allowed objects
other than T_OBJECT to become too complex, unless we run out of
shapes.

I don't see any reason to prevent that.

Ref: https://github.com/ruby/ruby/pull/6931
2025-05-11 19:35:58 +02:00
Satoshi Tagomori
ae2d5378e8 Suppress warning about unused variable without VM_CHECK_MODE 2025-05-11 23:32:50 +09:00
Satoshi Tagomori
bbcc3782b1 Skip updating max_iv_count when the namespace cannot be determined 2025-05-11 23:32:50 +09:00
Satoshi Tagomori
294b52fb9b Follow the code style about else 2025-05-11 23:32:50 +09:00
Satoshi Tagomori
90e5ce6132 Rename RCLASS_EXT() macro to RCLASS_EXT_PRIME() to prevent using it wrongly
The macro RCLASS_EXT() accesses the prime classext directly, but it can be
valid only in a limited situation when namespace is enabled.
So, to prevent using RCLASS_EXT() in the wrong way, rename the macro and
let the developer check it is ok to access the prime classext or not.
2025-05-11 23:32:50 +09:00
Satoshi Tagomori
382645d440 namespace on read 2025-05-11 23:32:50 +09:00
Daisuke Aritomo
98667f82d2 [DOC] Update documentation for ObjectSpace#each_object
Co-authored-by: Benoit Daloze <eregontp@gmail.com>
2025-05-10 19:32:21 +02:00
Daisuke Aritomo
29b3d683fb [DOC] Make clear that current behavior is not ideal 2025-05-10 19:32:21 +02:00
Daisuke Aritomo
a51b4a86fc [DOC] ObjectSpace#each_object behavior in multi-Ractor mode
This behavior of ObjectSpace#each_object has been around since Ruby 3.0
when Ractors were first introduced, but was never documented and has
caused some amount of confusion:

https://bugs.ruby-lang.org/issues/17360
https://bugs.ruby-lang.org/issues/19387
https://bugs.ruby-lang.org/issues/21149
2025-05-10 19:32:21 +02:00
Jean Boussier
d9502a8386 Rename rb_field_get -> rb_obj_field_get
To be consistent with `rb_obj_field_set`.
2025-05-10 15:39:33 +02:00
Jean Boussier
3135eddb4e Refactor FIRST_T_OBJECT_SHAPE_ID to not be used outside shape.c 2025-05-09 20:45:48 +02:00
Jean Boussier
ea77250847 Rename RB_OBJ_SHAPE -> rb_obj_shape
As well as `RB_OBJ_SHAPE_ID` -> `rb_obj_shape_id`
and `RSHAPE` is now a simple alias for `rb_shape_lookup`.

I tried to turn all these into `static inline` but I'm having
trouble with `RUBY_EXTERN rb_shape_tree_t *rb_shape_tree_ptr;`
not being exposed as I'd expect.
2025-05-09 10:22:51 +02:00
Jean Boussier
becc45ff4e Eliminate some rb_shape_t * usages outside of shape.c. 2025-05-09 10:22:51 +02:00
Jean Boussier
5782561fc1 Rename rb_shape_get_shape_id -> RB_OBJ_SHAPE_ID
And `rb_shape_get_shape` -> `RB_OBJ_SHAPE`.
2025-05-09 10:22:51 +02:00
Jean Boussier
3f7c0af051 Rename rb_shape_obj_too_complex -> rb_shape_obj_too_complex_p 2025-05-09 10:22:51 +02:00
Jean Boussier
334ebba221 Rename rb_shape_get_shape_by_id -> RSHAPE 2025-05-09 10:22:51 +02:00
Jean Boussier
f8b3fc520f Refactor rb_shape_traverse_from_new_root to not expose rb_shape_t 2025-05-09 10:22:51 +02:00
Jean Boussier
4de049a3f9 Deprecate ObjectSpace._id2ref
[Feature #15408]

Matz decided to deprecate it for Ruby 2.6 or 2.7 but that never
actually happened.

Given the object_id table is a contention point for Ractors
it seems sensible to finally deprecate this API so we can
generate and store object ids more efficiently in the future.
2025-05-09 09:19:25 +02:00
Jean Boussier
cf9046c00b Refactor id_to_obj_tbl compaction
Use `st_foreach_with_replace` rather than to call `st_insert`
from inside `st_foreach`, this saves from having to disable GC.

Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
2025-05-08 07:58:05 +02:00
Jean Boussier
2d1241ba97 Get rid of RB_GC_VM_ID_TO_OBJ_TABLE_KEYS 2025-05-08 07:58:05 +02:00
Jean Boussier
f48e45d1e9 Move object_id in object fields.
And get rid of the `obj_to_id_tbl`

It's no longer needed, the `object_id` is now stored inline
in the object alongside instance variables.

We still need the inverse table in case `_id2ref` is invoked, but
we lazily build it by walking the heap if that happens.

The `object_id` concern is also no longer a GC implementation
concern, but a generic implementation.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
2025-05-08 07:58:05 +02:00
Jean Boussier
6c9b3ac232 Refactor OBJ_TOO_COMPLEX_SHAPE_ID to not be referenced outside shape.h
Also refactor checks for `->type == SHAPE_OBJ_TOO_COMPLEX`.
2025-05-08 07:58:05 +02:00
Jean Boussier
0ea210d1ea Rename ivptr -> fields, next_iv_index -> next_field_index
Ivars will longer be the only thing stored inline
via shapes, so keeping the `iv_index` and `ivptr` names
would be confusing.

Instance variables won't be the only thing stored inline
via shapes, so keeping the `ivptr` name would be confusing.

`field` encompass anything that can be stored in a VALUE array.

Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.
2025-05-08 07:58:05 +02:00
Jeremy Evans
ce51ef30df Save one VALUE per embedded RTypedData
This halves the amount of memory used for embedded RTypedData if they
are one VALUE (8 bytes on 64-bit platforms) over the slot size limit.

For Set, on 64-bit it uses an embedded 56-byte struct.  With the
previous implementation, the embedded structs starts at offset 32,
resulting in a total size of 88.  Since that is over the 80 byte
limit, it goes to the next highest bucket, 160 bytes, wasting 72
bytes.  This allows it to fit in a 80 byte bucket, which reduces
the total size for small sets of from 224 bytes (160 bytes
embedded, 64 bytes malloc, 72 bytes wasted in embedding) to 144
bytes (80 bytes embedded, 64 bytes malloc, 0 bytes wasted in
embedding).

Any other embedded RTypedData will see similar advantages if they
are currently one VALUE over the limit.

To implement this, remove the typed_flag from struct RTypedData.
Embed the typed_flag information in the type member, which is
now a tagged pointer using VALUE type, using the bottom low 2 bits
as flags (1 bit for typed flag, the other for the embedded flag).
To get the actual pointer, RTYPEDDATA_TYPE masks out
the low 2 bits and then casts.  That moves the RTypedData data
pointer from offset 32 to offset 24 (on 64-bit).

Vast amount of code in the internals (and probably external C
extensions) expects the following code to work for both RData and
non-embedded RTypedData:

```c
DATA_PTR(obj) = some_pointer;
```

Allow this to work by moving the data pointer in RData between
the dmark and dfree pointers, so it is at the same offset (24
on 64-bit).

Other than these changes to the include files, the only changes
needed were to gc.c, to account for the new struct layouts,
handle setting the low bits in the type member, and to use
RTYPEDDATA_TYPE(obj) instead of RTYPEDDATA(obj)->type.
2025-05-05 09:46:32 +09:00
John Hawthorn
36c64b3be8 Also prefer FL_TEST_RAW in gc.c
Similar to 4a040eeb0d880b67a5005cce382122fd5b629b99, I noticed the test
for FL_FINALIZE checking FL_ABLE in a profile, and we shouldn't need to
do that here.
2025-05-02 14:28:25 -07:00
Nobuyoshi Nakada
c218862d3c
Fix style [ci skip] 2025-04-19 22:02:10 +09:00
Samuel Williams
20a1c1dc6b
Ensure struct rb_io is passed through to thread.c. (#13134) 2025-04-19 09:55:16 +09:00
John Hawthorn
57b6a7503f Lock-free hash set for fstrings [Feature #21268]
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
2025-04-18 13:03:54 +09:00
John Hawthorn
89199a47db Extract rb_gc_free_fstring to string.c
This allows more flexibility in how we deal with the fstring table
2025-04-18 13:03:54 +09:00
Jean Boussier
0606046c1a Lazily create objspace->id_to_obj_tbl
This inverse table is only useful if `ObjectSpace._id2ref` is used,
which is extremely rare. The only notable exception is the `drb` gem
and even then it has an option not to rely on `_id2ref`.

So if we assume this table will never be looked up, we can just
not maintain it, and if it turns out `_id2ref` is called, we
can lock the VM and re-build it.

```
compare-ruby: ruby 3.5.0dev (2025-04-10T09:44:40Z master 684cfa42d7) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-04-10T10:13:43Z lazy-id-to-obj d3aa9626cc) +YJIT +PRISM [arm64-darwin24]
warming up..

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|baseline   |     26.364M|   25.974M|
|           |       1.01x|         -|
|object_id  |     10.293M|   14.202M|
|           |           -|     1.38x|
```
2025-04-15 07:57:39 +09:00
Jean Boussier
085cc6e434 Ractor: revert to moving object bytes, but size pool aware
Using `rb_obj_clone` introduce other problems, such as `initialize_*`
callbacks invocation in the context of the parent ractor.

So we can revert back to copy the content of the object slots,
but in a way that is aware of size pools.
2025-04-04 16:26:29 +02:00
Jean Boussier
7db0e07134 Don't preserve object_id when moving object to another Ractor
That seemed like the logical thing to do to me, but ko1 disagree.
2025-03-31 12:01:55 +02:00