1823 Commits

Author SHA1 Message Date
Nobuyoshi Nakada
b054c2fe06 [Bug #19784] Fix behaviors against prefix with broken encoding
- String#start_with?
- String#delete_prefix
- String#delete_prefix!
2023-08-26 08:58:02 +09:00
Nobuyoshi Nakada
00ac3a64ba Introduce at_char_boundary function 2023-08-26 08:58:02 +09:00
Alan Wu
2214bcb70d Fix premature string collection during append
Previously, the following crashed due to use-after-free
with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl):

```ruby
str = 'a' * (32*1024*1024)
p({z: str})
```

32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash
could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values.
Under a debugger, one can see the `str2` of rb_str_buf_append()
getting prematurely collected while str_buf_cat4() allocates capacity.

Add GC guards so the buffer of `str2` lives across the GC run
initiated in str_buf_cat4().

[Bug #19792]
2023-08-23 18:07:49 -04:00
Peter Zhu
837c12b0c8 Use STR_EMBED_P instead of testing STR_NOEMBED 2023-08-22 16:31:36 -04:00
Peter Zhu
724223b4ca Don't check for STR_NOEMBED in rb_fstring
We don't need to check for STR_NOEMBED because the check above for
STR_EMBED_P means that it can never be false.
2023-08-18 09:24:45 -04:00
Burdette Lamar
0e162457d6
[DOC] Don't suppress autolinks (#8208) 2023-08-11 19:22:21 -04:00
Kunshan Wang
132f097149 No computing embed_capa_max in str_subseq
Fix str_subseq so that it does not attempt to predict the size of the
object returned by str_alloc_heap.
2023-08-03 14:52:44 -04:00
Nobuyoshi Nakada
af04e26924
Fill terminator properly 2023-07-28 22:17:53 +09:00
alexandre184
e5825de7c9
[Bug #19769] Fix range of size 1 in String#tr 2023-07-15 16:36:53 +09:00
Nobuyoshi Nakada
9dcdffb8bf
Make the string index functions closer to symmetric
So that irregular parts may be more noticeable.
2023-07-09 18:45:51 +09:00
Nobuyoshi Nakada
5e79d5a560
Make rb_str_rindex return byte index
Leave callers to convert byte index to char index, as well as
`rb_str_index`, so that `rb_str_rpartition` does not need to
re-convert char index to byte index.
2023-07-09 16:39:28 +09:00
Nobuyoshi Nakada
e2257831ab
[Bug #19763] Raise same message exception for regexp 2023-07-09 16:21:02 +09:00
Nobuyoshi Nakada
3d7a6bbc12 Ensure the byte position is a valid boundary 2023-06-28 22:42:04 +09:00
Nobuyoshi Nakada
bc3ac1872e [Bug #19748] Fix out-of-bound access in String#byteindex 2023-06-28 17:23:32 +09:00
Nobuyoshi Nakada
0cbfeb8210 [Bug #19746] String#index with regexp should clear $~ unless matched 2023-06-28 14:06:28 +09:00
Burdette Lamar
932dd9f10e
[DOC] Regexp doc (#7923) 2023-06-20 09:28:21 -04:00
Matt Valentine-House
d54f66d1b4 Assign into optimal size pools using String#split("")
When String#split is used with an empty string as the field seperator it
effectively splits the original string into chars, and there is a
pre-existing fast path for this using SPLIT_TYPE_CHARS.

However this path creates an empty array in the smallest size pool and
grows from there, despite already knowing the size of the desired array.

This commit pre-allocates the correct size array in this case in order
to allow the arrays to be embedded and avoid being allocated in the
transient heap
2023-06-09 10:54:40 +01:00
Peter Zhu
7577c101ed
Unify length field for embedded and heap strings (#7908)
* Unify length field for embedded and heap strings

The length field is of the same type and position in RString for both
embedded and heap allocated strings, so we can unify it.

* Remove RSTRING_EMBED_LEN
2023-06-06 10:19:20 -04:00
Peter Zhu
1a7ee14578 [DOC] Update flags doc for strings
The length of an embedded string is no longer in the flags.
2023-06-05 09:49:35 -04:00
Peter Zhu
a16cffe384 Simplify duplicated code
The capacity of the string can be calculated using the str_capacity
function.
2023-06-01 08:32:29 -04:00
Peter Zhu
8a8618d4f3 Don't refetch ptr and len
The call to RSTRING_GETMEM already fetched the pointer and length, so we
don't need to fetch it again.
2023-06-01 08:32:29 -04:00
Peter Zhu
c37ebfe08f Remove dead code in string.c
The STR_DEC_LEN macro is not used.
2023-05-26 13:34:26 -04:00
Matt Valentine-House
026321c5b9 [Feature #19474] Refactor NEWOBJ macros
NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec
2023-04-06 11:07:16 +01:00
Peter Zhu
1da2e7fca3
[Feature #19579] Remove !USE_RVARGC code (#7655)
Remove !USE_RVARGC code

[Feature #19579]

The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
2023-04-04 17:30:06 -04:00
Takashi Kokubun
32e0c97dfa RJIT: Optimize String#bytesize 2023-03-18 23:35:42 -07:00
Takashi Kokubun
233ddfac54 Stop exporting symbols for MJIT 2023-03-06 21:59:23 -08:00
Takashi Kokubun
f0218303e0 Optimize String#getbyte 2023-03-05 23:28:59 -08:00
Rômulo Ceccon
d78ae78fd7 rb_str_modify_expand: clear the string coderange
[Bug #19468]

b0b9f7201acab05c2a3ad92c3043a1f01df3e17f errornously stopped
clearing the coderange.

Since `rb_str_modify` clears it, `rb_str_modify_expand`
should too.
2023-03-03 15:32:25 +01:00
John Bampton
2f7270c681
Fix spelling (#7389) 2023-02-27 09:56:06 -08:00
Adam Daniels
2535b1819f Symbol#end_with? accepts Strings only
Regular expressions are not supported (same as String#end_with?).
2023-02-27 09:26:17 +09:00
BurdetteLamar
3b239d2480 Remove (newly unneeded) remarks about aliases 2023-02-19 14:26:34 -08:00
zverok
51bb5b23d4 [DOC] Small adjustment for String method docs
* Hide freeze method (no useful docs, same as Object#freeze)
* Add dedup to call-seq of str_uminus
2023-02-19 22:32:52 +02:00
Matt Valentine-House
d620855101 Rename rb_str_splice_{0,1} -> rb_str_update_{0,1} 2023-02-09 15:02:26 -05:00
Matt Valentine-House
601b83dcfc Remove alias macro rb_str_splice 2023-02-09 15:02:26 -05:00
Matt Valentine-House
72aba64fff Merge gc.h and internal/gc.h
[Feature #19425]
2023-02-09 10:32:29 -05:00
Jean Boussier
c6b90e5e9c Mark "mapping_buffer" as write barrier protected
It doesn't have any reference so it can be marked as protected.
2023-02-03 19:10:42 +01:00
Shugo Maeda
cce3960964 [Feature #19314] Add new arguments of String#bytesplice
bytesplice(index, length, str, str_index, str_length) -> string
  bytesplice(range, str, str_range) -> string

In these forms, the content of +self+ is replaced by str.byteslice(str_index, str_length) or str.byteslice(str_range); however the substring of +str+ is not allocated as a new string.
2023-01-20 18:02:37 +09:00
Shugo Maeda
f7b72462aa
String#bytesplice should return self
In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.

This change should be included in Ruby 3.2.1.
2023-01-19 17:13:07 +09:00
Matt Valentine-House
8a93e5d01b Use str_enc_copy_direct to improve performance
str_enc_copy_direct copies the string encoding over without checking the
frozen status of the string. Because we know that we're safe here (we
only use this function when interpolating strings on the stack via a
concatstrings instruction) we can safely skip this check
2023-01-13 10:31:35 -05:00
Matt Valentine-House
bb5fddd070 Remove MIN_PRE_ALLOC_SIZE from Strings.
This optimisation is no longer helpful now that we use VWA to allocate
strings in larger size pools where they can be embedded.
2023-01-13 10:31:35 -05:00
Peter Zhu
bfc887f391 Add str_enc_copy_direct
This commit adds str_enc_copy_direct, which is like str_enc_copy but
does not check the frozen status of str1 and does not check the validity
of the encoding of str2. This makes certain string operations ~5% faster.

```ruby
puts(Benchmark.measure do
  100_000_000.times do
    "a".downcase
  end
end)
```

Before this patch:

```
  7.587598   0.040858   7.628456 (  7.669022)
```

After this patch:

```
  7.133128   0.039809   7.172937 (  7.183124)
```
2023-01-12 09:06:15 -05:00
Peter Zhu
9726736006 Set STR_SHARED_ROOT flag on root of string 2023-01-09 08:49:29 -05:00
Peter Zhu
3be2acfafd Fix re-embedding of strings during compaction
The reference updating code for strings is not re-embedding strings
because the code is incorrectly wrapped inside of a
`if (STR_SHARED_P(obj))` clause. Shared strings can't be re-embedded
so this ends up being a no-op. This means that strings can be moved to a
large size pool during compaction, but won't be re-embedded, which would
waste the space.
2023-01-09 08:49:29 -05:00
Peter Zhu
d8ef0a98c6 [Bug #19319] Fix crash in rb_str_casemap
The following code crashes on my machine:

```
GC.stress = true

str = "testing testing testing"

puts str.capitalize
```

We need to ensure that the object `buffer_anchor` remains on the stack
so it does not get GC'd.
2023-01-06 11:36:28 -05:00
Nobuyoshi Nakada
98fbebf110
[DOC] Fix typo 2022-12-22 00:01:18 +09:00
S-H-GAMELINKS
1a64d45c67 Introduce encoding check macro 2022-12-02 01:31:27 +09:00
Jeremy Evans
571d21fd4a Make String#rstrip{,!} raise Encoding::CompatibilityError for broken coderange
It's questionable whether we want to allow rstrip to work for strings
where the broken coderange occurs before the trailing whitespace and
not after, but this approach is probably simpler, and I don't think
users should expect string operations like rstrip to work on broken
strings.

In some cases, this changes rstrip to raise
Encoding::CompatibilityError instead of ArgumentError.  However, as
the problem is related to an encoding issue in the receiver, and due
not due to an issue with an argument, I think
Encoding::CompatibilityError is the more appropriate error.

Fixes [Bug #18931]
2022-11-24 18:24:42 -08:00
S-H-GAMELINKS
1f4f6c9832 Using UNDEF_P macro 2022-11-16 18:58:33 +09:00
Takashi Kokubun
e7443dbbca
Rewrite Symbol#to_sym and #intern in Ruby (#6683) 2022-11-15 21:34:30 -08:00
Peter Zhu
710c1ada84 Use string's capacity to determine if reembeddable
During auto-compaction, using length to determine whether or not a
string can be re-embedded may be a problem for newly created strings.
This is because usually it requires a malloc before setting the length.
If the malloc triggers compaction, then the string may be re-embedded
and can cause crashes.
2022-11-14 16:59:43 -05:00