2298 Commits

Author SHA1 Message Date
Peter Zhu
fd0df1f8c6 Fix growth in minor GC when we have initial slots
If initial slots is set, then during a minor GC, if we have allocatable
pages but the heap is mostly full, then we will set `grow_heap` to true
since `total_slots` does not count allocatable pages so it will be less
than `init_slots`. This can cause `allocatable_pages` to grow to much
higher than desired since it will appear that the heap is mostly full.
2023-08-28 18:01:29 -04:00
Peter Zhu
5485680244 Expose RVALUE_OLD_AGE in GC::INTERNAL_CONSTANTS 2023-08-28 18:01:29 -04:00
Peter Zhu
b7237e3bbd Free all empty heap pages in Process.warmup
This commit adds `free_empty_pages` which frees all empty heap pages and
moves the number of pages freed to the allocatable pages counter. This
is used in Process.warmup to improve performance because page
invalidation from copy-on-write is slower than allocating a new page.
2023-08-27 09:39:29 -04:00
Peter Zhu
9ea9f99248 [Feature #19785] Deprecate RUBY_GC_HEAP_INIT_SLOTS
This environment variable is replaced by
`RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`, so it doesn't make sense to keep it.
2023-08-25 21:50:56 -04:00
Peter Zhu
2091bf9493 Expose stats about weak references
[Feature #19783]

This commit adds stats about weak references to `GC.latest_gc_info`.
It adds the following two keys:

- `weak_references_count`: number of weak references registered during
  the last GC.
- `retained_weak_references_count`: number of weak references that
  survived the last GC.
2023-08-25 09:01:21 -04:00
Peter Zhu
bfb395c620 Implement weak references in the GC
[Feature #19783]

This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.

Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
2023-08-25 09:01:21 -04:00
eileencodes
b92d599eec Fix typo in anonymous class string
If anonymous was shorted it should be `anon` not `annon`. Fixes typo in
APPEND_S for anonymous classes.
2023-08-23 13:09:18 +09:00
Peter Zhu
5db8b9b366 Move total_freed_objects to size pool
This commit moves the `total_freed_objects` statistic to the size pool
which allows for `total_freed_objects` key in `GC.stat_heap`.
2023-08-17 15:53:00 -04:00
Peter Zhu
52506cbf51 Move total_allocated_objects to size pool
This commit moves the `total_allocated_objects` statistic to the size
pool which allows for `total_allocated_objects` key in `GC.stat_heap`.
2023-08-17 15:53:00 -04:00
Takashi Kokubun
e210b899dc
Move the PC regardless of the leaf flag (#8232)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2023-08-16 20:28:33 -07:00
Peter Zhu
0f94e65359 Add stat force_incremental_marking_finish_count
This commit adds key force_incremental_marking_finish_count to
GC.stat_heap. This statistic returns the number of times the size pool
has forced incremental marking to finish due to running out of slots.
2023-08-15 15:18:05 -04:00
Peter Zhu
300bc14589 [DOC] Improve some GC docs 2023-08-15 08:54:27 -04:00
Peter Zhu
74b9c7d207 Remove wrapper functions of RVALUE_REMEMBERED
Functions rgengc_remembered, rgengc_remembered_sweep, and
rgengc_remembersetbits_get are just wrappers of RVALUE_REMEMBERED and
doesn't do much more. We can remove all those and use RVALUE_REMEMBERED
directly instead.
2023-08-08 09:44:13 -04:00
Nobuyoshi Nakada
acd27e3ec3
Move GC_CAN_COMPILE_COMPACTION definition before used 2023-08-06 18:45:40 +09:00
Peter Zhu
4b45b2764b Don't check stack for moved after compaction
We don't need to check stack for moved objects after compaction because
the mutator cannot run between marking the stack and the end of
compaction. However, the stack may have moved objects leftover from
marking and sweeping phases. This means that their pages will be
invalidated and all objects moved back. We don't need to move these
objects back.

This also fixes the issue on Windows where some compaction tests
sometimes fail due to the page of the object being invalidated.
2023-08-04 09:13:57 -04:00
Peter Zhu
c65856d44f Remove unneeded function prototype
Function prototype for gc_mode_transition is not needed as it's not
used before the implementation.
2023-08-03 11:12:07 -04:00
Peter Zhu
c01b17f7fc Fix default value of global_init_slots
Not setting a value to global_init_slots causes get_envparam_size to
output a broken default value.
2023-07-31 15:12:20 -04:00
Peter Zhu
b98838b65c Store initial slots per size pool
This commit stores the initial slots per size pool, configured with
the environment variables `RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`. This
ensures that the configured initial slots remains a low bound for the
number of slots in the heap, which can prevent heaps from thrashing in
size.
2023-07-31 11:46:53 -04:00
Koichi Sasada
cfd7729ce7 use inline cache for refinements
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.

This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.

fix [Bug #18572]

```ruby
 # without refinements

class C
  def foo = :C
end

N = 1_000_000

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}

_END__
              user     system      total        real
master    0.362859   0.002544   0.365403 (  0.365424)
modified  0.357251   0.000000   0.357251 (  0.357258)
```

```ruby
 # with refinment but without using

class C
  def foo = :C
end

module R
  refine C do
    def foo = :R
  end
end

N = 1_000_000

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}
__END__
               user     system      total        real
master     0.957182   0.000000   0.957182 (  0.957212)
modified   0.359228   0.000000   0.359228 (  0.359238)
```

```ruby
 # with using

class C
  def foo = :C
end

module R
  refine C do
    def foo = :R
  end
end

N = 1_000_000

using R

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}
2023-07-31 17:13:43 +09:00
Koichi Sasada
36023d5cb7 mark cc->cme_ if it is for super
`vm_search_super_method()` makes orphan CCs (they are not connected
from ccs) and `cc->cme_` can be collected before without marking.
2023-07-31 14:04:31 +09:00
Koichi Sasada
087a2deccf check cc->* liveness strictly
to fix SEGV like
http://ci.rvm.jp/results/trunk-repeat20-asserts@ruby-sp2-docker/4664004
```
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(sigsegv+0x4f) [0x7fcb0343e7df] /tmp/ruby/src/trunk-repeat20-asserts/signal.c:920
/lib/x86_64-linux-gnu/libc.so.6(0x7fcb02e4d520) [0x7fcb02e4d520]
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(RB_SPECIAL_CONST_P+0x13) [0x7fcb03311ea3] /tmp/ruby/src/trunk-repeat20-asserts/include/ruby/internal/special_consts.h:329
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(RB_BUILTIN_TYPE) /tmp/ruby/src/trunk-repeat20-asserts/include/ruby/internal/value_type.h:183
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_object_moved_p) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:1624
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_object_moved_p+0xe) [0x7fcb0331ed16] /tmp/ruby/src/trunk-repeat20-asserts/include/ruby/internal/special_consts.h:329
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_ref_update_imemo) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:10132
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_update_object_references) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:10411
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_ref_update+0xab) [0x7fcb0331fcbb] /tmp/ruby/src/trunk-repeat20-asserts/gc.c:10570
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_update_references) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:10604
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_compact_finish) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:5425
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_sweep_compact) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:8476
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_sweep) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:6040
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_start+0xe25) [0x7fcb03325795] /tmp/ruby/src/trunk-repeat20-asserts/gc.c:9323
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(rb_multi_ractor_p+0x0) [0x7fcb03326108] /tmp/ruby/src/trunk-repeat20-asserts/gc.c:9208
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(rb_vm_lock_leave) /tmp/ruby/src/trunk-repeat20-asserts/vm_sync.h:92
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(garbage_collect) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:9210
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(rbimpl_atomic_exchange+0x0) [0x7fcb033262b9] /tmp/ruby/src/trunk-repeat20-asserts/gc.c:9646
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_finalize_deferred) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:4345
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_start_internal) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:9647
/tmp/ruby/build/trunk-repeat20-asserts/libruby.so.3.3(gc_compact) /tmp/ruby/src/trunk-repeat20-asserts/gc.c:10748
```
2023-07-30 08:11:53 +09:00
Koichi Sasada
7a7aba755d check liveness of cc->klass and cc->cme_
`cc->klass` and `cc->cme_` can be free'ed while last marking
so that it should be checked bofore updating the pointers.

Note that `T_MOVED` is living, but `is_live_object()` returns false.
2023-07-29 14:25:15 +09:00
ko1
6dc15cc889 do not clear cme but invalidate cc
To invalidate a cc, we need to clear cc->klass by `vm_cc_invalidate()`.
I hope this patch fix the CI failures.
2023-07-29 09:06:14 +09:00
Ruby
c330037c1a cc->cme should not be marked.
cc is callcache.

cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by `vm_cc_invalidate()`.

cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.
- klass marks cme if klass uses cme.
- caller classe's ccs->cme marks cc->cme.
- if cc is invalidated (klass doesn't refer the cc),
  cc is invalidated by `vm_cc_invalidate()` and cc->cme is
  not be accessed.
- On the multi-Ractors, cme will be collected with global GC
  so that it is safe if GC is not interleaving while accessing
  cc and cme.

fix [Bug #19436]

```ruby
10_000.times{|i|
  # p i if (i%1_000) == 0

  str = "x" * 1_000_000
  def str.foo = nil
  eval "def call#{i}(s) = s.foo"
  send "call#{i}", str
}
```

Without this patch:

```
real    1m5.639s
user    0m6.637s
sys     0m58.292s
```

and with this patch:

```
real    0m2.045s
user    0m1.627s
sys     0m0.164s
```
2023-07-28 10:51:11 +09:00
Jean Boussier
9b405a18be Process.warmup: precompute strings coderange
This both save time for when it will be eventually needed,
and avoid mutating heap pages after a potential fork.

Instrumenting some large Rails app, I've witnessed up to
58% of String instances having their coderange still unknown.
2023-07-26 11:41:23 +02:00
Kunshan Wang
639aa76e82
Embed struct rmatch into GC slot (#8097) 2023-07-20 14:17:38 -04:00
Matt Valentine-House
dd8372b3f3 cvc table entries can move 2023-07-20 13:38:58 +01:00
Peter Zhu
4c03eab1aa Lazily allocate pages at boot
We can just set alloctable pages for the first size pool rather than
eagerly allocating pages.
2023-07-18 14:52:37 -04:00
Jean Boussier
fa30b99c34 Implement Process.warmup
[Feature #18885]

For now, the optimizations performed are:

  - Run a major GC
  - Compact the heap
  - Promote all surviving objects to oldgen

Other optimizations may follow.
2023-07-17 11:20:15 +02:00
Peter Zhu
4e0b287912 Remove RGENGC_OLD_NEWOBJ_CHECK
The code doesn't compile, so probably nobody is using this.
2023-07-14 13:53:34 -04:00
Peter Zhu
914b657a2b Remove unused branch in write barrier
The branch doesn't compile, so it's probably not used.
2023-07-14 13:53:20 -04:00
Peter Zhu
3223181284 Remove RARRAY_CONST_PTR_TRANSIENT
RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT.
2023-07-13 14:48:14 -04:00
Matt Valentine-House
6a62b9b200 Remove unused forward declarations 2023-07-13 15:30:33 +01:00
Peter Zhu
1e7b67f733 [Feature #19730] Remove transient heap 2023-07-13 09:27:33 -04:00
Matt Valentine-House
d426343418 Store object age in a bitmap
Closes [Feature #19729]

Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.

This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).

This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
2023-07-13 09:21:36 +01:00
Nobuyoshi Nakada
5204ad56e1
Compile debugging code for stress to class always 2023-06-30 23:59:04 +09:00
Peter Zhu
58386814a7 Don't check for null pointer in calls to free
According to the C99 specification section 7.20.3.2 paragraph 2:

> If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
2023-06-30 09:13:31 -04:00
Peter Zhu
c3dc9fcc70 Fix heap growth in GC.verify_compaction_references
We should grow by at least gc_params.heap_init_slots, but the previous
calculation was incorrect.
2023-06-06 10:18:50 -04:00
eileencodes
40f090f433 Revert "Revert "Fix cvar caching when class is cloned""
This reverts commit 10621f7cb9a0c70e568f89cce47a02e878af6778.

This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.

Original commit message:

Fix cvar caching when class is cloned

The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.

This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.

To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.

This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.

We also added a marking function which was missing.

Fixes [Bug #19379]

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2023-06-05 11:11:12 -07:00
Aaron Patterson
10621f7cb9
Revert "Fix cvar caching when class is cloned"
This reverts commit 77d1b082470790c17c24a2f406b4fec5d522636b.
2023-06-01 14:55:36 -07:00
eileencodes
77d1b08247 Fix cvar caching when class is cloned
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.

This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.

To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.

This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.

We also added a marking function which was missing.

Fixes [Bug #19379]

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2023-06-01 08:52:48 -07:00
Peter Zhu
e87f6c899e Don't immediately promote children of old objects
[Feature #19678]

References from an old object to a write barrier protected young object
will not immediately promote the young object. Instead, the young object
will age just like any other object, meaning that it has to survive
three collections before being promoted to the old generation.
References from an old object to a write barrier unprotected object will
place the parent object in the remember set for marking during minor
collections. This allows the child object to be reclaimed in minor
collections at the cost of increased time for minor collections.

On one of [Shopify's highest traffic Ruby apps, Storefront
Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this feature in
production. We compare the GC time and response time of web workers that
have the original behaviour (non-experimental group) and this new
behaviour (experimental group). We see that with this feature we spend
significantly less time in the GC, 0.81x on average, 0.88x on p99, and
0.45x on p99.9.

This translates to improvements in average response time (0.96x) and p99
response time (0.92x).
2023-05-25 08:56:22 -04:00
Peter Zhu
a23ae56c4d Add REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO
[Feature #19571]

This commit adds the environment variable
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO` which is
used to calculate the `remembered_wb_unprotected_objects_limit` using a
ratio of `old_objects`. This should improve performance by reducing
major GC because, in a major GC, we mark all of the old objects, so we
should have more uncollectible WB unprotected objects before starting a
major GC. The default has been set to 0.01 (1% of old objects).

On one of [Shopify's highest traffic Ruby apps, Storefront Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this patch in
production. In the graphs below, we have the `tuned` group which uses
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0.01` (the
default value), and an `untuned` group, which turns this feature off
with `RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0`. We
see that the tuned group spends significantly less time in GC, on
average 0.67x of the time compared to the untuned group and 0.49x for
p99. We see this improvement in GC time translate to improvements in
response times. The average response time is now 0.96x of the time
compared to the untuned group and 0.86x for p99.

https://user-images.githubusercontent.com/15860699/229559078-e23e8ce4-5f1f-4a2f-b5ef-5769f92b8c70.png
2023-05-24 12:11:48 -04:00
Jean Boussier
85b4cd7cf8
gc.c: get rid of unused objspace parameters (#7853) 2023-05-24 15:14:46 +02:00
Nobuyoshi Nakada
8d242a33af
rb_bug prints a newline after the message 2023-05-20 21:43:30 +09:00
Peter Zhu
cea9c30fa5 Move ar_hint to ar_table_struct
This allows Hashes with ST tables to fit int he 80 byte size pool.
2023-05-17 09:19:40 -04:00
Peter Zhu
0938964ba1 Implement Hash ST tables on VWA 2023-05-17 09:19:40 -04:00
Peter Zhu
5199f2aaf9 Implement Hash AR tables on VWA 2023-05-17 09:19:40 -04:00
Ian Ker-Seymer
2f9f44f077
Ensure the VM is alive before accessing objspace in C API (Feature #19627)
[Feature #19627]
2023-05-04 08:48:34 +02:00
Peter Zhu
a0d1069e03 Make classes embedded on 32 bit
Classes are now exactly 80 bytes when embedded, which perfectly fits the
3rd size pool on 32 bit systems.
2023-04-16 11:06:31 -04:00