12 Commits

Author SHA1 Message Date
Jean Boussier
fae86a701e string.c: Directly create strings with the correct encoding
While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.

On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.

After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.

By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.

After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.
2024-11-13 13:32:32 +01:00
Jean Boussier
3a7846b1aa Add a hint of ASCII-8BIT being BINARY
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
2024-04-18 10:17:26 +02:00
Adam Hess
6816e8efcf Free everything at shutdown
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-12-07 15:52:35 -05:00
S-H-GAMELINKS
914cf26d6e parenthesize to macro 2022-12-02 01:31:27 +09:00
S-H-GAMELINKS
1a64d45c67 Introduce encoding check macro 2022-12-02 01:31:27 +09:00
Benoit Daloze
6525b6f760 Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32
* And simplify callers of get_actual_encoding().
* See [Feature #18949].
* See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-12 14:02:34 +02:00
卜部昌平
daf0c04a47 internal/*.h: skip doxygen
These contents are purely implementation details, not worth appearing in
CAPI documents. [ci skip]
2021-09-10 20:00:06 +09:00
Jean Boussier
7e8a9af9db rb_enc_interned_str: handle autoloaded encodings
If called with an autoloaded encoding that was not yet
initialized, `rb_enc_interned_str` would crash with
a NULL pointer exception.

See: https://github.com/ruby/ruby/pull/4119#issuecomment-800189841
2021-03-22 21:37:48 +09:00
卜部昌平
4ff3f20540 add #include guard hack
According to MSVC manual (*1), cl.exe can skip including a header file
when that:

- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.

GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).

Sun C lacked #pragma once for a looong time.  Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.

This changeset modifies header files so that each of them include
strictly one #ifndef...#endif.  I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]

*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
2020-04-13 16:06:00 +09:00
卜部昌平
9e6e39c351
Merge pull request #2991 from shyouhei/ruby.h
Split ruby.h
2020-04-08 13:28:13 +09:00
卜部昌平
bf53d6c7d1 other minior internal header tweaks
These headers need no rewrite.  Just add some minor tweaks, like
addition of #include lines.  Mainly cosmetic.

TIMET_MAX_PLUS_ONE was deleted because the macro was used from only
one place (directly write expression there).
2019-12-26 20:45:12 +09:00
卜部昌平
b739a63eb4 split internal.h into files
One day, I could not resist the way it was written.  I finally started
to make the code clean.  This changeset is the beginning of a series of
housekeeping commits.  It is a simple refactoring; split internal.h into
files, so that we can divide and concur in the upcoming commits.  No
lines of codes are either added or removed, except the obvious file
headers/footers.  The generated binary is identical to the one before.
2019-12-26 20:45:12 +09:00