1berry/ruby - ruby - Gitea : Git Mirror

Author	SHA1	Message	Date
Étienne Barrié	12be40ae6b	Implement chilled strings [Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-03-19 09:26:49 +01:00
Thomas Marshall	7e4b1f8e19	[Bug #20322 ] Fix rb_enc_interned_str_cstr null encoding The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be a null pointer, but this currently causes a segmentation fault when trying to autoload the encoding. This commit fixes the issue by checking for NULL before calling `rb_enc_autoload`.	2024-03-03 10:43:35 +00:00
Peter Zhu	ce8531fed4	Stop using rb_str_locktmp_ensure publicly rb_str_locktmp_ensure is a private API.	2024-02-23 14:08:29 -05:00
Takashi Kokubun	8a6740c70e	YJIT: Lazily push a frame for specialized C funcs (#10080 ) * YJIT: Lazily push a frame for specialized C funcs Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> * Fix a comment on pc_to_cfunc * Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame * Rename it to jit_prepare_lazy_frame_call * Fix a typo * Optimize String#getbyte as well * Optimize String#byteslice as well --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>	2024-02-23 19:08:09 +00:00
Peter Zhu	510404f2de	Stop using rb_fstring publicly rb_fstring is a private API, so we should use rb_str_to_interned_str instead, which is a public API.	2024-02-23 13:33:46 -05:00
Peter Zhu	df5b8ea4db	Remove unneeded RUBY_FUNC_EXPORTED	2024-02-23 10:24:21 -05:00
Takashi Kokubun	d5080f6e8b	Fix -Wsign-compare on String#initialize ../string.c:1886:57: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘long int’ [-Wsign-compare] 1886 \| if (STR_EMBED_P(str)) RUBY_ASSERT(osize <= str_embed_capa(str)); \| ^~	2024-02-22 16:11:30 -08:00
Nobuyoshi Nakada	e04146129e	[Bug #20292 ] Truncate embedded string to new capacity	2024-02-22 22:46:18 +09:00
Nobuyoshi Nakada	b1d70e4264	[Bug #20280 ] Check by `rb_parser_enc_str_coderange` Co-authored-by: Yuichiro Kaneko <spiketeika@gmail.com>	2024-02-19 16:33:26 +09:00
Nobuyoshi Nakada	fcc55dc226	[Bug #20280 ] Raise SyntaxError on invalid encoding symbol	2024-02-19 16:33:26 +09:00
Peter Zhu	4d1b3a2bf3	Unset STR_SHARED when setting string to embed	2024-02-15 12:19:45 -05:00
Yusuke Endoh	25d74b9527	Do not include a backtick in error messages and backtraces [Feature #16495]	2024-02-15 18:42:31 +09:00
Burdette Lamar	65f5435540	[DOC] Doc compliance (#9955 )	2024-02-14 10:47:42 -05:00
Alan Wu	6261d4b4d8	Fix use-after-move in Symbol#inspect The allocation could re-embed `orig_str` and invalidate the data pointer from RSTRING_GETMEM() if the string is embedded. Found on CI, where the test introduced in 7002e776944 ("Fix Symbol#inspect for GC compaction") recently failed. See: <https://github.com/ruby/ruby/actions/runs/7880657560/job/21503019659>	2024-02-13 14:49:54 -05:00
Aaron Patterson	c35fea8509	Specialize String#byteslice(a, b) (#9939 ) * Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>	2024-02-13 16:20:27 +00:00
Peter Zhu	ac38f259aa	Replace assert with RUBY_ASSERT in string.c assert does not print the bug report, only the file and line number of the assertion that failed. RUBY_ASSERT prints the full bug report, which makes it much easier to debug.	2024-02-12 15:07:47 -05:00
Peter Zhu	c6b391214c	[DOC] Improve flags of string	2024-02-08 10:49:38 -05:00
Peter Zhu	5e0c171451	Make io_fwrite safe for compaction [Bug #20169] Embedded strings are not safe for system calls without the GVL because compaction can cause pages to be locked causing the operation to fail with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire, which guarantees that the return string is not embedded.	2024-02-05 11:11:07 -05:00
Takashi Kokubun	51753ec7fa	Annotate Symbol#to_s as leaf (#9769 )	2024-01-31 10:47:35 -05:00
Peter Zhu	e17c83e02c	Fix memory leak in String#tr and String#tr_s rb_enc_codepoint_len could raise, which would cause the memory in buf to leak. For example: str1 = "\xE0\xA0\xA1#{" " * 100}".force_encoding("EUC-JP") str2 = "" str3 = "a".force_encoding("Windows-31J") 10.times do 1_000_000.times do str1.tr_s(str2, str3) rescue end puts `ps -o rss= -p #{$$}` end Before: 17536 22752 28032 33312 38688 43968 49200 54432 59744 64992 After: 12176 12352 12352 12448 12448 12448 12448 12448 12448 12448	2024-01-17 08:54:25 -05:00
tompng	ade56737e2	Fix coderange of invalid_encoding_string.<<(ord) Appending valid encoding character can change coderange from invalid to valid. Example: "\x95".force_encoding('sjis')<<0x5C will be a valid string "\x{955C}"	2024-01-16 23:18:55 +09:00
Peter Zhu	b3d6128049	Fix memory leak in grapheme clusters [Bug #20150] String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end Before: 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 After: 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896	2024-01-08 09:14:04 -05:00
Peter Zhu	5aba5f0454	[DOC] Add parentheses in call-seq for String#include?	2024-01-02 19:19:12 -05:00
Peter Zhu	7002e77694	Fix Symbol#inspect for GC compaction The test fails when RGENGC_CHECK_MODE is turned on: 1) Failure: TestSymbol#test_inspect_under_gc_compact_stress [test/ruby/test_symbol.rb:123]: <":testing"> expected but was <":\x00\x00\x00\x00\x00\x00\x00">.	2023-12-24 21:29:40 -05:00
Peter Zhu	50bf437341	Fix String#sub for GC compaction The test fails when RGENGC_CHECK_MODE is turned on: TestString#test_sub_gc_compact_stress = 9.42 s 1) Failure: TestString#test_sub_gc_compact_stress [test/ruby/test_string.rb:2089]: <"aaa [amp] yyy"> expected but was <"aaa [] yyy">.	2023-12-23 18:00:27 -05:00
Nobuyoshi Nakada	ab7f54688b	Stir the hash value more with encoding index	2023-12-17 00:30:00 +09:00
Nobuyoshi Nakada	b710f96b5a	[Bug #20068 ] Encoding does not matter to empty strings	2023-12-16 16:00:12 +09:00
Jeremy Evans	0d53dba7ce	Make String#chomp! raise ArgumentError for 2+ arguments if string is empty String#chomp! returned nil without checking the number of passed arguments in this case.	2023-12-13 07:05:21 -08:00
Peter Zhu	ee0eca191f	Make String#undump compaction safe	2023-12-01 15:04:31 -05:00
Peter Zhu	80ea7fbad8	Pin embedded shared strings Embedded shared strings cannot be moved because strings point into the slot of the shared string. There may be code using the RSTRING_PTR on the stack, which would pin the string but not pin the shared string, causing it to move.	2023-12-01 15:04:31 -05:00
Peter Zhu	3d908a41ab	Guard match from GC in String#gsub We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.	2023-11-29 19:21:40 -05:00
Peter Zhu	94015e0dce	Guard match from GC when scanning string We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.	2023-11-27 16:49:52 -05:00
Jean Boussier	83c385719d	Specialize String#dup `String#+@` is 2-3 times faster than `String#dup` because it can directly go through `rb_str_dup` instead of using the generic much slower `rb_obj_dup`. This fact led to the existance of the ugly `Performance/UnfreezeString` rubocop performance rule that encourage users to rewrite the much more readable and convenient `"foo".dup` into the ugly `(+"foo")`. Let's make that rubocop rule useless. ``` compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22] last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884) built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22] warming up.. \| \|compare-ruby\|built-ruby\| \|:------\|-----------:\|---------:\| \|uplus \| 16.312M\| 16.332M\| \| \| -\| 1.00x\| \|dup \| 5.912M\| 16.329M\| \| \| -\| 2.76x\| ```	2023-11-20 14:33:20 +01:00
Jean Boussier	ea1b1ea1aa	String#force_encoding don't clear coderange if encoding is unchanged Some code out there blind calls `force_encoding` without checking what the original encoding was, which clears the coderange uselessly. If the String is big, it can be a rather costly mistake. For instance the `rack-utf8_sanitizer` gem does this on request bodies.	2023-11-09 12:38:10 +01:00
Nobuyoshi Nakada	1910bd4247	String for string literal is not resizable	2023-11-08 00:59:45 +09:00
Jean Boussier	ac8ec004e5	Make String.new size pools aware. If the required capacity would fit in an embded string, returns one. This can reduce malloc churn for code that use string buffers.	2023-11-02 23:34:58 +01:00
Nobuyoshi Nakada	50520cc193	[DOC] Missing comment markers	2023-09-27 16:18:05 +09:00
Nobuyoshi Nakada	6b66b5fded	[Bug #19902 ] Update the coderange regarding the changed region	2023-09-26 15:35:40 +09:00
John Hawthorn	d89b15cdce	Use end of char boundary in start_with? Previously we used the next character following the found prefix to determine if the match ended on a broken character. This had caused surprising behaviour when a valid character was followed by a UTF-8 continuation byte. This commit changes the behaviour to instead look for the end of the last character in the prefix. [Bug #19784] Co-authored-by: ywenc <ywenc@github.com> Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2023-09-01 16:23:28 -07:00
Nobuyoshi Nakada	b054c2fe06	[Bug #19784 ] Fix behaviors against prefix with broken encoding - String#start_with? - String#delete_prefix - String#delete_prefix!	2023-08-26 08:58:02 +09:00
Nobuyoshi Nakada	00ac3a64ba	Introduce `at_char_boundary` function	2023-08-26 08:58:02 +09:00
Alan Wu	2214bcb70d	Fix premature string collection during append Previously, the following crashed due to use-after-free with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl): ```ruby str = 'a' * (3210241024) p({z: str}) ``` 32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values. Under a debugger, one can see the `str2` of rb_str_buf_append() getting prematurely collected while str_buf_cat4() allocates capacity. Add GC guards so the buffer of `str2` lives across the GC run initiated in str_buf_cat4(). [Bug #19792]	2023-08-23 18:07:49 -04:00
Peter Zhu	837c12b0c8	Use STR_EMBED_P instead of testing STR_NOEMBED	2023-08-22 16:31:36 -04:00
Peter Zhu	724223b4ca	Don't check for STR_NOEMBED in rb_fstring We don't need to check for STR_NOEMBED because the check above for STR_EMBED_P means that it can never be false.	2023-08-18 09:24:45 -04:00
Burdette Lamar	0e162457d6	[DOC] Don't suppress autolinks (#8208 )	2023-08-11 19:22:21 -04:00
Kunshan Wang	132f097149	No computing embed_capa_max in str_subseq Fix str_subseq so that it does not attempt to predict the size of the object returned by str_alloc_heap.	2023-08-03 14:52:44 -04:00
Nobuyoshi Nakada	af04e26924	Fill terminator properly	2023-07-28 22:17:53 +09:00
alexandre184	e5825de7c9	[Bug #19769 ] Fix range of size 1 in `String#tr`	2023-07-15 16:36:53 +09:00
Nobuyoshi Nakada	9dcdffb8bf	Make the string index functions closer to symmetric So that irregular parts may be more noticeable.	2023-07-09 18:45:51 +09:00
Nobuyoshi Nakada	5e79d5a560	Make `rb_str_rindex` return byte index Leave callers to convert byte index to char index, as well as `rb_str_index`, so that `rb_str_rpartition` does not need to re-convert char index to byte index.	2023-07-09 16:39:28 +09:00

1 2 3 4 5 ...

1862 Commits