1955 Commits

Author SHA1 Message Date
Nobuyoshi Nakada
fa85d23ff4
[Bug #21380] Prohibit modification in String#split block
Reported at https://hackerone.com/reports/3163876
2025-05-29 11:10:58 +09:00
Jean Boussier
925dec8d70 Rename rb_shape_set_shape_id in rb_obj_set_shape_id 2025-05-27 15:34:02 +02:00
BurdetteLamar
909a0daab6 [DOC] More tweaks for String#byteindex 2025-05-26 13:42:35 -04:00
John Hawthorn
f483befd90 Add shape_id to RBasic under 32 bit
This makes `RBobject` `4B` larger on 32 bit systems
but simplifies the implementation a lot.

[Feature #21353]

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2025-05-26 10:31:54 +02:00
Nobuyoshi Nakada
aad9fa2853
Use RB_VM_LOCKING 2025-05-25 15:22:43 +09:00
BurdetteLamar
3403055d13 [DOC] Tweaks for String#byteindex 2025-05-22 10:17:46 -04:00
Burdette Lamar
cc90adb68d
[DOC] Tweaks for String#append_as_bytes 2025-05-16 12:50:55 -04:00
BurdetteLamar
a188249616 [DOC] Tweaks for String#b 2025-05-16 12:47:17 -04:00
BurdetteLamar
1f09c9fa14 [DOC] Tweaks for String#ascii_only? 2025-05-16 12:46:56 -04:00
Burdette Lamar
4fc5047af8
[DOC] Tweaks for String#=~ (#13325) 2025-05-15 11:18:49 -04:00
Burdette Lamar
7afee53fa0
[DOC] Tweaks for String#<< (#13306) 2025-05-14 15:24:30 -04:00
Burdette Lamar
10e8119cff
[DOC] Tweaks for String#== (#13323) 2025-05-14 15:24:19 -04:00
Burdette Lamar
b00a339603
[DOC] Tweaks for String#[] (#13335) 2025-05-14 14:34:09 -04:00
BurdetteLamar
1f72512b03 [DOC] Tweaks for String#[]= 2025-05-14 14:33:40 -04:00
BurdetteLamar
96b823a211 [DOC] Tweaks for String#<=> 2025-05-13 13:14:25 -04:00
Nobuyoshi Nakada
64944cf422
[DOC] Remove a garbage 2025-05-13 00:07:56 +09:00
Burdette Lamar
bc6d48bd34
[DOC] Tweak for String#+@ (#13285) 2025-05-12 10:16:37 -04:00
BurdetteLamar
7a660d7c69 [DOC] Tweaks for What's Here 2025-05-08 16:34:33 -04:00
Burdette Lamar
46a8240884
[DOC] Tweaks for String#-@ 2025-05-08 10:31:47 -04:00
Jean Boussier
f48e45d1e9 Move object_id in object fields.
And get rid of the `obj_to_id_tbl`

It's no longer needed, the `object_id` is now stored inline
in the object alongside instance variables.

We still need the inverse table in case `_id2ref` is invoked, but
we lazily build it by walking the heap if that happens.

The `object_id` concern is also no longer a GC implementation
concern, but a generic implementation.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
2025-05-08 07:58:05 +02:00
BurdetteLamar
35918df740 [DOC] Tweaks for String#+ 2025-05-04 17:14:44 -04:00
BurdetteLamar
d2de59798c [DOC] Tweaks for String#* 2025-05-04 17:14:17 -04:00
BurdetteLamar
d71e171464 [DOC] Tweaks for String#% 2025-05-04 17:13:50 -04:00
Burdette Lamar
79fe8aa010
[DOC] Tweaks for String.new 2025-05-01 10:51:22 -04:00
Nobuyoshi Nakada
b42afa1dbc
Suppress gcc 15 unterminated-string-initialization warnings 2025-04-30 20:04:10 +09:00
Jean Boussier
1f090403e2 Fix comparison of signed and unsigned integers
```
../string.c:660:38: warning: comparison of integers of different signs: 'rb_atomic_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
  660 |             RUBY_ASSERT(table->count < table->capacity / 2);
```
2025-04-23 18:35:00 +02:00
Nobuyoshi Nakada
c218862d3c
Fix style [ci skip] 2025-04-19 22:02:10 +09:00
Jean Boussier
0f25886fac Implement dsize function for fstring_table_type
The fstring table size used to be reported as part of the VM
size, but since it was refactored to be lock-less it was no
longer reported.

Since it's now wrapped by a `T_DATA`, we can implement its
`dsize` function and get a valuable insight into the size
of the table.

```
{"address":"0x100ebff18", "type":"DATA", "shape_id":0, "slot_size":80,
"struct":"VM/fstring_table", "memsize":131176, ...
```
2025-04-19 12:42:14 +09:00
Jean Boussier
52487705d0 Fix style of recent fstring feature 2025-04-19 11:38:22 +09:00
John Hawthorn
57b6a7503f Lock-free hash set for fstrings [Feature #21268]
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
2025-04-18 13:03:54 +09:00
John Hawthorn
89199a47db Extract rb_gc_free_fstring to string.c
This allows more flexibility in how we deal with the fstring table
2025-04-18 13:03:54 +09:00
Samuel Williams
c13ac4d615 Assert the GVL is held when performing various rb_ functions.
[Feature #20877]
2025-04-14 18:28:09 +09:00
Burdette Lamar
2a55cc3fb8
[DOC] Tweaks to String::try_convert 2025-04-02 12:03:17 -04:00
Étienne Barrié
6ecfe643b5 Freeze $/ and make it ractor safe
[Feature #21109]

By always freezing when setting the global rb_rs variable, we can ensure
it is not modified and can be accessed from a ractor.

We're also making sure it's an instance of String and does not have any
instance variables.

Of course, if $/ is changed at runtime, it may cause surprising behavior
but doing so is deprecated already anyway.

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2025-03-27 17:54:56 +01:00
Jean Boussier
a14d9b8d57 string.c: Improve fstring_hash to reduce collisions
`rb_str_hash` doesn't include the encoding for ASCII only strings
because ASCII only strings are equal regardless of their encoding.

But in the case if the `fstring_table`, two identical ASCII strings
with different encodings aren't equal.

Given it's common to have both `:foo` (or `def foo`) and `"foo"`
in the same source code, this causes a lot of collisions in the
`fstring_table`.
2025-03-08 10:56:02 +01:00
Jean Boussier
c224ca4fea Fix a race condition with interned strings sweeping.
[Bug #21172]

This fixes a rare CI failure.

The timeline of the race condition is:

- A `"foo" oid=1` string is interned.
- `"foo" oid=1` is no longer referenced and will be swept in the future.
- Another `"foo" oid=2` string is interned.
- `register_fstring` finds `"foo" oid=1`, but since it is about to be swept,
  removes it from `fstring_table` and insert `"foo" oid=2` instead.
- `"foo" oid=1` is swept, since it has the `RSTRING_FSTR` flag,
  a `st_delete` is issued in `fstring_table` which removes `"foo" oid=2`.

I don't know how to reproduce this bug consistently in a single test
case.
2025-03-05 18:57:21 +01:00
Jean Boussier
87f9c3c65e String#gsub! Elide MatchData allocation when we know it can't escape
In gsub is used with a string replacement or a map that doesn't
have a default proc, we know for sure no code can cause the MatchData
to escape the `gsub` call.

In such case, we still have to allocate a new MatchData because we
don't know what is the lifetime of the backref, but for any subsequent
match we can re-use the MatchData we allocated ourselves, reducing
allocations significantly.

This partially fixes [Misc #20652], except when a block is used,
and partially reduce the performance impact of
abc0304cb28cb9dcc3476993bc487884c139fd11 / [Bug #17507]

```
compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf146399f) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.577k|    3.697k|
|                 |           -|     1.03x|
|escape_bin       |      5.869k|    6.743k|
|                 |           -|     1.15x|
|escape_utf8      |      3.448k|    3.738k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      6.361k|    7.267k|
|                 |           -|     1.14x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2025-02-24 18:32:46 +01:00
Jean Boussier
f32d5071b7 Elide string allocation when using String#gsub in MAP mode
If the provided Hash doesn't have a default proc, we know for
sure that we'll never call into user provided code, hence the
string we allocate to access the Hash can't possibly escape.

So we don't actually have to allocate it, we can use a fake_str,
AKA a stack allocated string.

```
compare-ruby: ruby 3.5.0dev (2025-02-10T13:47:44Z master 3fb455adab) +PRISM [arm64-darwin23]
built-ruby: ruby 3.5.0dev (2025-02-10T17:09:52Z opt-gsub-alloc ea5c28958f) +PRISM [arm64-darwin23]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.374k|    3.722k|
|                 |           -|     1.10x|
|escape_bin       |      5.469k|    6.587k|
|                 |           -|     1.20x|
|escape_utf8      |      3.465k|    3.734k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      5.752k|    7.283k|
|                 |           -|     1.27x|
```
2025-02-12 10:23:50 +01:00
Kouhei Yanagita
99792d0634 [DOC] Fix code markup in String#match 2025-01-22 15:07:19 +09:00
Jean Boussier
e2f1f7c567 [Doc] Encourage use of encoding constants
Lots of documentation examples still use encoding APIs with encoding names
rather than encoding constants. I think it would be preferable to direct
users toward constants as it can help with auto-completion, static analysis
and such.
2025-01-12 11:48:01 +01:00
Nobuyoshi Nakada
e433e6515e
[DOC] Exclude 'Class' and 'Module' from RDoc's autolinking 2025-01-02 12:36:06 +09:00
Alan Wu
880a90cf2e [DOC] [Feature #20205] Document the new power of String#+@ 2024-12-13 14:25:32 -05:00
Jean Boussier
26d020cb6e Optimize rb_must_asciicompat
While profiling `strscan`, I noticed `rb_must_asciicompat` was quite
slow, as more than 5% of the benchmark was spent in it: https://share.firefox.dev/49bOcTn

By checking for the common 3 ASCII compatible encoding index first,
we can skip a lot of expensive operations in the happy path.
2024-11-27 14:50:07 +01:00
Nobuyoshi Nakada
6b4f8945d6 Many of Oniguruma functions need valid encoding strings 2024-11-26 11:46:34 +09:00
Nobuyoshi Nakada
02b70256b5 Check negative integer underflow 2024-11-26 11:46:34 +09:00
Matt Valentine-House
551be8219e Place all non-default GC API behind USE_SHARED_GC
So that it doesn't get included in the generated binaries for builds
that don't support loading shared GC modules

Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
2024-11-25 13:05:23 +00:00
Peter Zhu
41a9460227 [DOC] Fix typo in comment for STR_PRECOMPUTED_HASH 2024-11-20 11:16:10 -05:00
Kouhei Yanagita
eb2b0c2a0d [DOC] Fix the default limit of String#split
We can't pass `nil` as the second parameter of `String#split`.
Therefore, descriptions like "if limit is nil, ..." are not appropriate.
2024-11-19 12:15:48 +09:00
Randy Stauner
beafae9750
YJIT: Specialize String#[] (String#slice) with fixnum arguments (#12069)
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments

String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku

This speeds up these benchmarks by 1-2%.

* YJIT: Try harder to get type info for `String#[]`

In the large generated code of the mail gem the context doesn't have
the type info.  In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
2024-11-13 12:25:09 -05:00
Jean byroot Boussier
6deeec5d45
Mark strings returned by Symbol#to_s as chilled (#12065)
* Use FL_USER0 for ELTS_SHARED

This makes space in RString for two bits for chilled strings.

* Mark strings returned by `Symbol#to_s` as chilled

[Feature #20350]

`STR_CHILLED` now spans on two user flags. If one bit is set it
marks a chilled string literal, if it's the other it marks a
`Symbol#to_s` chilled string.

Since it's not possible, and doesn't make much sense to include
debug info when `--debug-frozen-string-literal` is set, we can't
include allocation source, but we can safely include the symbol
name in the warning message, making it much easier to find the source
of the issue.

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>

---------

Co-authored-by: Étienne Barrié <etienne.barrie@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2024-11-13 09:20:00 -05:00