130 Commits

Author SHA1 Message Date
Jean Boussier
cd7495a1d0 [ruby/json] Further improve parsing errors
Report EOF when applicable instead of an empty fragment.

Also stop fragment extraction on first whitespace.

https://github.com/ruby/json/commit/cc1daba860
2025-05-13 14:12:22 +09:00
Jean Boussier
8cc1aa82f1 [ruby/json] Add missing single quotes in error messages
https://github.com/ruby/json/commit/f3dde3cb2f
2025-05-13 14:12:22 +09:00
Jean Boussier
50ef208369 [ruby/json] parser.c: include line and column in error messages
https://github.com/ruby/json/commit/30e35b9ba5
2025-05-13 14:12:22 +09:00
Jean Boussier
8f008598c3 [ruby/json] parser.c: refactor raise_parse_error to have document start
https://github.com/ruby/json/commit/832b5b1a4c
2025-05-13 14:12:22 +09:00
Jean Boussier
ec171b4075 [ruby/json] Move create_addtions logic in Ruby.
By leveraging the `on_load` callback we can move all this logic
out of the parser. Which mean we no longer have to duplicate
that logic in both parser and that we'll later be able to extract
it entirely from the gem.

https://github.com/ruby/json/commit/f411ddf1ce
2025-03-28 12:44:53 +09:00
Jean Boussier
e8c46f4ca5 [ruby/json] JSON.load invoke the proc callback directly from the parser.
And substitute the return value like `Marshal.load` doesm
which I can only assume was the intent.

This also open the door to re-implement all the `create_addition`
logic in `json/common.rb`.

https://github.com/ruby/json/commit/73d2137fd3
2025-03-28 12:44:53 +09:00
Jean Boussier
756b75f242 [ruby/json] Remove Class#json_creatable? monkey patch.
https://github.com/ruby/json/commit/1ca7efed1f
2025-03-28 12:44:53 +09:00
Jean Boussier
e6a2cf9fd7
[ruby/json] Fix potential out of bound read in json_string_unescape.
https://github.com/ruby/json/commit/cf242d89a0
2025-03-13 10:33:25 +09:00
Jean Boussier
1d07deb422 [ruby/json] Raise a ParserError on all incomplete unicode escape sequence.
This was the behavior until `2.10.0` unadvertently changed it.

`"\u1"` would raise, but `"\u1zzz"` wouldn't.

https://github.com/ruby/json/commit/7d0637b9e6
2025-03-12 18:02:09 +09:00
Jean Boussier
0d62037fc0
[ruby/json] Ensure parser error snippets are valid UTF-8
Fix: https://github.com/ruby/json/issues/755

Error messages now include a snippet of the document
that doesn't parse to help locate the issue, however
the way it was done wasn't UTF-8 aware, and it could
result in exception messages with truncated characters.

It would be nice to go a bit farther and actually support
codepoints, but it's a lot of complexity to do it in C,
perhaps if we move that logic to Ruby given it's not a
performance sensitive codepath.

https://github.com/ruby/json/commit/e144793b72
2025-02-27 13:32:32 +09:00
Nobuyoshi Nakada
dc3d2a3c2f
[ruby/json] Avoid plain char for ctype macros
On some platforms ctype functions are defined as macros accesing tables.
A plain char may be `signed` or `unsigned` per implementations and the
extension result implementation dependent.

gcc warns such case:

```
parser.c: In function 'rstring_cache_fetch':
parser.c:138:33: warning: array subscript has type 'char' [-Wchar-subscripts]
  138 |     if (RB_UNLIKELY(!isalpha(str[0]))) {
      |                              ~~~^~~
parser.c: In function 'rsymbol_cache_fetch':
parser.c:190:33: warning: array subscript has type 'char' [-Wchar-subscripts]
  190 |     if (RB_UNLIKELY(!isalpha(str[0]))) {
      |                              ~~~^~~
```

https://github.com/ruby/json/commit/4431b362f6
2025-01-30 16:56:02 +09:00
tompng
86b262179d [ruby/json] Reject invalid number: - -.1 -e0
https://github.com/ruby/json/commit/b9bfeecfa9
2025-01-20 14:20:55 +01:00
tompng
525d7a68e4 [ruby/json] Raise parse error on invalid comments
https://github.com/ruby/json/commit/2f57f40467
2025-01-20 14:20:55 +01:00
tompng
c026e44bb5 [ruby/json] Fix parsing incomplete unicode escape "\uaaa"
https://github.com/ruby/json/commit/86c0d4eb7e
2025-01-20 14:20:55 +01:00
Jean Boussier
33708f2dc4 [ruby/json] Fix a regression in the parser with leading /
Ref: https://github.com/ruby/ruby/pull/12598

This could lead to an infinite loop.

https://github.com/ruby/json/commit/f8cfa2696a
2025-01-20 10:31:56 +01:00
Jean Boussier
df8f93848e [ruby/json] json_string_unescape: use memchr to search for backslashes
https://github.com/ruby/json/commit/5e6cfcf724
2025-01-20 16:09:00 +09:00
Jean Boussier
ba8f22c040 [ruby/json] Cleanup json_decode_float
Move all the decimal_class option parsing in the constructor.

https://github.com/ruby/json/commit/e9adefdc38
2025-01-20 16:09:00 +09:00
Jean Boussier
6c6f9672e2 [ruby/json] parser.c: Pass the JSON_ParserConfig pointer
Doesn't make a measurable performance difference but is a
bit clearer.

https://github.com/ruby/json/commit/314d117c61
2025-01-20 16:09:00 +09:00
Jean Boussier
f664e863d8 [ruby/json] Use RSTRING_END
https://github.com/ruby/json/commit/dd9c46c805
2025-01-20 16:09:00 +09:00
Jean Boussier
e4b54b0a36 [ruby/json] Replace fbuffer by stack buffers or RB_ALLOCV in parser.c
We only use that buffer for parsing integer and floats, these
are unlikely to be very big, and if so we can just use RB_ALLOCV as it will
almost always end in a small `alloca`.

This allow to no longer need `rb_protect` around the parser.

https://github.com/ruby/json/commit/994859916a
2025-01-20 16:09:00 +09:00
Jean Boussier
99e9eb5380 [ruby/json] Implement write barriers for ParserConfig objects
https://github.com/ruby/json/commit/591056a526
2025-01-20 16:09:00 +09:00
Jean Boussier
ef585744c0 Finalize Kevin's handrolled parser.
And get rid of the Ragel parser.

This is 7% faster on activitypub, 15% after on twitter and 11% faster
on citm_catalog.

There might be some more optimization opportunities, I did a quick
optimization pass to fix a regression in string parsing, but other
than that I haven't dug much in performance.
2025-01-20 16:09:00 +09:00
Jean Boussier
599fbeaffa [ruby/json] Refactor JSON::Ext::Parser to split configuration and parsing state
Ref: https://github.com/ruby/json/pull/718

The existing `Parser` interface is pretty bad, as it forces to
instantiate a new instance for each document.

Instead it's preferable to only take the config and do all the
initialization needed, and then keep the parsing state on the
stack on in ephemeral memory.

This refactor makes the `JSON::Coder` pull request much easier to
implement in a performant way.

https://github.com/ruby/json/commit/c8d5236a92

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2025-01-14 09:08:02 +01:00
Naohisa Goto
979b19b741 [ruby/json] Add support for Solaris 10 which lacks strnlen()
Check for existence of strnlen() and use alternative code if it is missing.

https://github.com/ruby/json/commit/48d4bbc3a0
2024-12-19 08:45:31 +09:00
Jean Boussier
6805e88935 [ruby/json] Stop using rb_gc_mark_locations
It's using `rb_gc_mark_maybe` under the hood, which isn't what
we need.

https://github.com/ruby/json/commit/e10d0bffcd
2024-11-26 15:11:05 +09:00
Jean Boussier
a4183781ec [ruby/json] Only use the key cache if the Hash is in an Array
Otherwise the likeliness of seeing that key again is really low, and looking up
the cache is just a waste.

Before:

```
== Parsing small hash (65 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. https://github.com/ruby/json/commit/9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   343.049k i/100ms
                  oj   213.943k i/100ms
          Oj::Parser    31.583k i/100ms
           rapidjson   303.433k i/100ms
Calculating -------------------------------------
                json      3.704M (± 1.5%) i/s  (270.01 ns/i) -     18.525M in   5.003078s
                  oj      2.200M (± 1.1%) i/s  (454.46 ns/i) -     11.125M in   5.056526s
          Oj::Parser    285.369k (± 4.8%) i/s    (3.50 μs/i) -      1.453M in   5.103866s
           rapidjson      3.216M (± 1.6%) i/s  (310.95 ns/i) -     16.082M in   5.001973s

Comparison:
                json:  3703517.4 i/s
           rapidjson:  3215983.0 i/s - 1.15x  slower
                  oj:  2200417.1 i/s - 1.68x  slower
          Oj::Parser:   285369.1 i/s - 12.98x  slower

== Parsing test from oj (258 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. https://github.com/ruby/json/commit/9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    54.539k i/100ms
                  oj    41.473k i/100ms
          Oj::Parser    24.064k i/100ms
           rapidjson    51.466k i/100ms
Calculating -------------------------------------
                json    549.386k (± 1.6%) i/s    (1.82 μs/i) -      2.781M in   5.064316s
                  oj    417.003k (± 1.3%) i/s    (2.40 μs/i) -      2.115M in   5.073047s
          Oj::Parser    226.500k (± 4.7%) i/s    (4.42 μs/i) -      1.131M in   5.005466s
           rapidjson    526.124k (± 1.0%) i/s    (1.90 μs/i) -      2.676M in   5.087176s

Comparison:
                json:   549385.6 i/s
           rapidjson:   526124.3 i/s - 1.04x  slower
                  oj:   417003.4 i/s - 1.32x  slower
          Oj::Parser:   226500.4 i/s - 2.43x  slower
```

After:

```
== Parsing small hash (65 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. https://github.com/ruby/json/commit/9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   361.394k i/100ms
                  oj   217.203k i/100ms
          Oj::Parser    28.855k i/100ms
           rapidjson   303.404k i/100ms
Calculating -------------------------------------
                json      3.859M (± 2.9%) i/s  (259.13 ns/i) -     19.515M in   5.061302s
                  oj      2.191M (± 1.6%) i/s  (456.49 ns/i) -     11.077M in   5.058043s
          Oj::Parser    315.132k (± 7.1%) i/s    (3.17 μs/i) -      1.587M in   5.065707s
           rapidjson      3.156M (± 4.0%) i/s  (316.88 ns/i) -     15.777M in   5.008949s

Comparison:
                json:  3859046.5 i/s
           rapidjson:  3155778.5 i/s - 1.22x  slower
                  oj:  2190616.0 i/s - 1.76x  slower
          Oj::Parser:   315132.4 i/s - 12.25x  slower

== Parsing test from oj (258 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. https://github.com/ruby/json/commit/9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    55.682k i/100ms
                  oj    40.343k i/100ms
          Oj::Parser    25.119k i/100ms
           rapidjson    51.500k i/100ms
Calculating -------------------------------------
                json    555.808k (± 1.4%) i/s    (1.80 μs/i) -      2.784M in   5.010092s
                  oj    412.283k (± 1.7%) i/s    (2.43 μs/i) -      2.098M in   5.089900s
          Oj::Parser    279.306k (±13.3%) i/s    (3.58 μs/i) -      1.356M in   5.022079s
           rapidjson    517.177k (± 2.7%) i/s    (1.93 μs/i) -      2.626M in   5.082352s

Comparison:
                json:   555808.3 i/s
           rapidjson:   517177.1 i/s - 1.07x  slower
                  oj:   412283.2 i/s - 1.35x  slower
          Oj::Parser:   279306.5 i/s - 1.99x  slower
```

https://github.com/ruby/json/commit/00c45ddc9f
2024-11-14 11:21:39 +09:00
Jean Boussier
58317328b6 [ruby/json] Rename parse_float into parse_number
https://github.com/ruby/json/commit/e51e796697
2024-11-11 09:40:10 +09:00
Aaron Patterson
c991f75c19 [ruby/json] Reduce comparisons when parsing numbers
Before this commit, we would try to scan for a float, then if that
failed, scan for an integer.  But floats and integers have many bytes in
common, so we would end up scanning the same bytes multiple times.

This patch combines integer and float scanning machines so that we only
have to scan bytes once.  If the machine finds "float parts", then it
executes the "isFloat" transition in the machine, which sets a boolean
letting us know that the parser found a float.

If we didn't find a float, but we did match, then we know it's an int.

https://github.com/ruby/json/commit/0c0e0930cd
2024-11-11 09:40:10 +09:00
Jean Boussier
d188a6883f [ruby/json] Implement a fast path for integer parsing
`rb_cstr2inum` isn't very fast because it handles tons of
different scenarios, and also require a NULL terminated string
which forces us to copy the number into a secondary buffer.

But since the parser already computed the length, we can much more
cheaply do this with a very simple function as long as the number
is small enough to fit into a native type (`long long`).

If the number is too long, we can fallback to the `rb_cstr2inum`
slowpath.

Before:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    40.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    425.941 (± 1.9%) i/s    (2.35 ms/i) -      2.160k in   5.072833s
                  oj    349.617 (± 1.7%) i/s    (2.86 ms/i) -      1.750k in   5.006953s
          Oj::Parser    464.767 (± 1.7%) i/s    (2.15 ms/i) -      2.340k in   5.036381s
           rapidjson    382.413 (± 2.4%) i/s    (2.61 ms/i) -      1.938k in   5.070757s

Comparison:
                json:      425.9 i/s
          Oj::Parser:      464.8 i/s - 1.09x  faster
           rapidjson:      382.4 i/s - 1.11x  slower
                  oj:      349.6 i/s - 1.22x  slower
```

After:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    46.000 i/100ms
                  oj    33.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    462.332 (± 3.2%) i/s    (2.16 ms/i) -      2.346k in   5.080504s
                  oj    351.140 (± 1.1%) i/s    (2.85 ms/i) -      1.782k in   5.075616s
          Oj::Parser    473.500 (± 1.3%) i/s    (2.11 ms/i) -      2.385k in   5.037695s
           rapidjson    395.052 (± 3.5%) i/s    (2.53 ms/i) -      1.989k in   5.042275s

Comparison:
                json:      462.3 i/s
          Oj::Parser:      473.5 i/s - same-ish: difference falls within error
           rapidjson:      395.1 i/s - 1.17x  slower
                  oj:      351.1 i/s - 1.32x  slower
```

https://github.com/ruby/json/commit/3a4dc9e1b4
2024-11-06 23:31:30 +01:00
Jean Boussier
6cea370b23 [ruby/json] parser.rl: parse_string implement a fast path
If we assume most string don't contain any escape sequence we can avoid
a lot of costly operations when it holds true.

Before:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   884.000 i/100ms
                  oj   789.000 i/100ms
          Oj::Parser   943.000 i/100ms
           rapidjson   584.000 i/100ms
Calculating -------------------------------------
                json      8.897k (± 1.3%) i/s  (112.40 μs/i) -     45.084k in   5.068520s
                  oj      7.967k (± 1.5%) i/s  (125.52 μs/i) -     40.239k in   5.051985s
          Oj::Parser      9.564k (± 1.4%) i/s  (104.56 μs/i) -     48.093k in   5.029626s
           rapidjson      5.947k (± 1.4%) i/s  (168.16 μs/i) -     29.784k in   5.009437s

Comparison:
                json:     8896.5 i/s
          Oj::Parser:     9563.8 i/s - 1.08x  faster
                  oj:     7966.8 i/s - 1.12x  slower
           rapidjson:     5946.7 i/s - 1.50x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    83.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    54.000 i/100ms
Calculating -------------------------------------
                json    823.083 (± 1.8%) i/s    (1.21 ms/i) -      4.150k in   5.043805s
                  oj    632.538 (± 1.4%) i/s    (1.58 ms/i) -      3.200k in   5.060073s
          Oj::Parser    769.122 (± 1.8%) i/s    (1.30 ms/i) -      3.850k in   5.007501s
           rapidjson    548.494 (± 1.5%) i/s    (1.82 ms/i) -      2.754k in   5.022153s

Comparison:
                json:      823.1 i/s
          Oj::Parser:      769.1 i/s - 1.07x  slower
                  oj:      632.5 i/s - 1.30x  slower
           rapidjson:      548.5 i/s - 1.50x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    427.162 (± 1.2%) i/s    (2.34 ms/i) -      2.173k in   5.087666s
                  oj    351.463 (± 2.8%) i/s    (2.85 ms/i) -      1.768k in   5.035149s
          Oj::Parser    461.849 (± 3.7%) i/s    (2.17 ms/i) -      2.340k in   5.074461s
           rapidjson    395.155 (± 1.8%) i/s    (2.53 ms/i) -      1.989k in   5.034927s

Comparison:
                json:      427.2 i/s
          Oj::Parser:      461.8 i/s - 1.08x  faster
           rapidjson:      395.2 i/s - 1.08x  slower
                  oj:      351.5 i/s - 1.22x  slower
```

After:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   953.000 i/100ms
                  oj   813.000 i/100ms
          Oj::Parser   956.000 i/100ms
           rapidjson   563.000 i/100ms
Calculating -------------------------------------
                json      9.525k (± 1.2%) i/s  (104.98 μs/i) -     47.650k in   5.003252s
                  oj      8.117k (± 0.5%) i/s  (123.20 μs/i) -     40.650k in   5.008283s
          Oj::Parser      9.590k (± 3.2%) i/s  (104.27 μs/i) -     48.756k in   5.089794s
           rapidjson      6.020k (± 0.9%) i/s  (166.10 μs/i) -     30.402k in   5.050155s

Comparison:
                json:     9525.3 i/s
          Oj::Parser:     9590.1 i/s - same-ish: difference falls within error
                  oj:     8116.7 i/s - 1.17x  slower
           rapidjson:     6020.5 i/s - 1.58x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    87.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    75.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    866.563 (± 0.8%) i/s    (1.15 ms/i) -      4.350k in   5.020138s
                  oj    643.567 (± 0.8%) i/s    (1.55 ms/i) -      3.264k in   5.072101s
          Oj::Parser    777.346 (± 3.5%) i/s    (1.29 ms/i) -      3.900k in   5.023933s
           rapidjson    557.158 (± 0.7%) i/s    (1.79 ms/i) -      2.805k in   5.034731s

Comparison:
                json:      866.6 i/s
          Oj::Parser:      777.3 i/s - 1.11x  slower
                  oj:      643.6 i/s - 1.35x  slower
           rapidjson:      557.2 i/s - 1.56x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. https://github.com/ruby/json/commit/7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    40.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    429.216 (± 1.2%) i/s    (2.33 ms/i) -      2.173k in   5.063351s
                  oj    354.755 (± 1.1%) i/s    (2.82 ms/i) -      1.785k in   5.032374s
          Oj::Parser    465.114 (± 3.7%) i/s    (2.15 ms/i) -      2.360k in   5.081634s
           rapidjson    387.135 (± 1.3%) i/s    (2.58 ms/i) -      1.950k in   5.037787s

Comparison:
                json:      429.2 i/s
          Oj::Parser:      465.1 i/s - 1.08x  faster
           rapidjson:      387.1 i/s - 1.11x  slower
                  oj:      354.8 i/s - 1.21x  slower
```

https://github.com/ruby/json/commit/96bd97c61e
2024-11-06 23:31:30 +01:00
Nobuyoshi Nakada
8254f6492c [ruby/json] Categorize deprecated warning
https://github.com/ruby/json/commit/1acce7aceb
2024-11-06 23:31:30 +01:00
Jean Boussier
ca8f21ace8 [ruby/json] Resync 2024-11-05 18:00:36 +01:00
Jean Boussier
ed22e68379 [ruby/json] JSON::Ext::Parser mark the name cache entries when not on the heap
This is somewhat dead code as unless you are using `JSON::Parser.new`
direcltly we never allocate `JSON::Ext::Parser` anymore.

But still, we should mark all its reference in case some code out there
uses that.

Followup: #675

https://github.com/ruby/json/commit/8bf74a977b
2024-11-05 18:00:36 +01:00
Jean Boussier
ee4fa4ccee [ruby/json] json_string_unescape: Use the returned RString as buffer
Rather than to copy into a buffer to unescape and then copy that
buffer into the final string, we can directly copy into the final
string.

The downside is that if the string contains a lot of escaping, we
end up returning a string that's larger than strictly necessary, but
it's probably fine.

Before:

```
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    56.000 i/100ms
                  oj    58.000 i/100ms
           oj strict    74.000 i/100ms
          Oj::Parser    76.000 i/100ms
           rapidjson    52.000 i/100ms
Calculating -------------------------------------
                json    556.659 (± 2.9%) i/s    (1.80 ms/i) -      2.800k in   5.034719s
                  oj    604.077 (± 3.8%) i/s    (1.66 ms/i) -      3.016k in   5.001546s
           oj strict    706.942 (± 3.5%) i/s    (1.41 ms/i) -      3.552k in   5.030954s
          Oj::Parser    752.917 (± 3.2%) i/s    (1.33 ms/i) -      3.800k in   5.052707s
           rapidjson    546.470 (± 3.5%) i/s    (1.83 ms/i) -      2.756k in   5.049855s

Comparison:
                json:      556.7 i/s
          Oj::Parser:      752.9 i/s - 1.35x  faster
           oj strict:      706.9 i/s - 1.27x  faster
                  oj:      604.1 i/s - 1.09x  faster
           rapidjson:      546.5 i/s - same-ish: difference falls within error

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    29.000 i/100ms
                  oj    32.000 i/100ms
           oj strict    38.000 i/100ms
          Oj::Parser    42.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    317.858 (± 3.1%) i/s    (3.15 ms/i) -      1.595k in   5.023245s
                  oj    348.168 (± 2.6%) i/s    (2.87 ms/i) -      1.760k in   5.058431s
           oj strict    394.599 (± 2.8%) i/s    (2.53 ms/i) -      1.976k in   5.012073s
          Oj::Parser    403.771 (± 3.0%) i/s    (2.48 ms/i) -      2.058k in   5.101578s
           rapidjson    383.441 (± 3.7%) i/s    (2.61 ms/i) -      1.938k in   5.061355s

Comparison:
                json:      317.9 i/s
          Oj::Parser:      403.8 i/s - 1.27x  faster
           oj strict:      394.6 i/s - 1.24x  faster
           rapidjson:      383.4 i/s - 1.21x  faster
                  oj:      348.2 i/s - 1.10x  faster
```

After:

```
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    56.000 i/100ms
                  oj    62.000 i/100ms
           oj strict    73.000 i/100ms
          Oj::Parser    76.000 i/100ms
           rapidjson    54.000 i/100ms
Calculating -------------------------------------
                json    561.009 (± 7.5%) i/s    (1.78 ms/i) -      2.800k in   5.039548s
                  oj    601.124 (± 4.3%) i/s    (1.66 ms/i) -      3.038k in   5.064686s
           oj strict    707.455 (± 3.4%) i/s    (1.41 ms/i) -      3.577k in   5.062540s
          Oj::Parser    751.799 (± 3.1%) i/s    (1.33 ms/i) -      3.800k in   5.059509s
           rapidjson    535.641 (± 3.2%) i/s    (1.87 ms/i) -      2.700k in   5.045816s

Comparison:
                json:      561.0 i/s
          Oj::Parser:      751.8 i/s - 1.34x  faster
           oj strict:      707.5 i/s - 1.26x  faster
                  oj:      601.1 i/s - same-ish: difference falls within error
           rapidjson:      535.6 i/s - same-ish: difference falls within error

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    30.000 i/100ms
                  oj    32.000 i/100ms
           oj strict    36.000 i/100ms
          Oj::Parser    42.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    313.248 (± 7.3%) i/s    (3.19 ms/i) -      1.560k in   5.014118s
                  oj    341.977 (± 4.1%) i/s    (2.92 ms/i) -      1.728k in   5.063332s
           oj strict    387.062 (± 6.2%) i/s    (2.58 ms/i) -      1.944k in   5.045961s
          Oj::Parser    400.423 (± 4.0%) i/s    (2.50 ms/i) -      2.016k in   5.044513s
           rapidjson    379.046 (± 6.1%) i/s    (2.64 ms/i) -      1.911k in   5.064461s

Comparison:
                json:      313.2 i/s
          Oj::Parser:      400.4 i/s - 1.28x  faster
           oj strict:      387.1 i/s - 1.24x  faster
           rapidjson:      379.0 i/s - 1.21x  faster
                  oj:      342.0 i/s - same-ish: difference falls within error
```

https://github.com/ruby/json/commit/5e1ec4a268
2024-11-01 13:04:24 +09:00
Jean Boussier
b8b33efd4d [ruby/json] Remove String#-@ check in extconf.rb
Now that older rubies have been droped, we no longer need to check
for all that.

https://github.com/ruby/json/commit/35cf2b84e0
2024-11-01 13:04:24 +09:00
Jean Boussier
165cc6cf40 [ruby/json] json_string_unescape: assume the string doesn't need escaping
If that assumption holds true, then we don't need to copy the
string into a buffer to unescape it. For small string is just saves
copying, but for large ones it also saves a malloc/free combo.

Before:

```
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    52.000 i/100ms
                  oj    61.000 i/100ms
           oj strict    70.000 i/100ms
          Oj::Parser    71.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    510.111 (± 2.9%) i/s    (1.96 ms/i) -      2.548k in   5.000029s
                  oj    610.232 (± 3.1%) i/s    (1.64 ms/i) -      3.050k in   5.003725s
           oj strict    713.231 (± 3.2%) i/s    (1.40 ms/i) -      3.570k in   5.010902s
          Oj::Parser    762.598 (± 3.0%) i/s    (1.31 ms/i) -      3.834k in   5.033130s
           rapidjson    553.029 (± 7.4%) i/s    (1.81 ms/i) -      2.750k in   5.022630s

Comparison:
                json:      510.1 i/s
          Oj::Parser:      762.6 i/s - 1.49x  faster
           oj strict:      713.2 i/s - 1.40x  faster
                  oj:      610.2 i/s - 1.20x  faster
           rapidjson:      553.0 i/s - same-ish: difference falls within error

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    28.000 i/100ms
                  oj    33.000 i/100ms
           oj strict    37.000 i/100ms
          Oj::Parser    43.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    303.853 (± 3.6%) i/s    (3.29 ms/i) -      1.540k in   5.076079s
                  oj    348.009 (± 2.0%) i/s    (2.87 ms/i) -      1.749k in   5.027738s
           oj strict    396.679 (± 3.3%) i/s    (2.52 ms/i) -      1.998k in   5.042271s
          Oj::Parser    406.699 (± 2.2%) i/s    (2.46 ms/i) -      2.064k in   5.077587s
           rapidjson    393.463 (± 3.3%) i/s    (2.54 ms/i) -      1.976k in   5.028501s

Comparison:
                json:      303.9 i/s
          Oj::Parser:      406.7 i/s - 1.34x  faster
           oj strict:      396.7 i/s - 1.31x  faster
           rapidjson:      393.5 i/s - 1.29x  faster
                  oj:      348.0 i/s - 1.15x  faster
```

After:

```
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    56.000 i/100ms
                  oj    62.000 i/100ms
           oj strict    72.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    568.025 (± 2.1%) i/s    (1.76 ms/i) -      2.856k in   5.030272s
                  oj    630.936 (± 1.4%) i/s    (1.58 ms/i) -      3.162k in   5.012630s
           oj strict    705.784 (±11.2%) i/s    (1.42 ms/i) -      3.456k in   5.006706s
          Oj::Parser    783.989 (± 1.7%) i/s    (1.28 ms/i) -      3.927k in   5.010343s
           rapidjson    557.630 (± 2.0%) i/s    (1.79 ms/i) -      2.805k in   5.032388s

Comparison:
                json:      568.0 i/s
          Oj::Parser:      784.0 i/s - 1.38x  faster
           oj strict:      705.8 i/s - 1.24x  faster
                  oj:      630.9 i/s - 1.11x  faster
           rapidjson:      557.6 i/s - same-ish: difference falls within error

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    29.000 i/100ms
                  oj    33.000 i/100ms
           oj strict    38.000 i/100ms
          Oj::Parser    43.000 i/100ms
           rapidjson    37.000 i/100ms
Calculating -------------------------------------
                json    319.271 (± 3.1%) i/s    (3.13 ms/i) -      1.595k in   5.001128s
                  oj    347.946 (± 1.7%) i/s    (2.87 ms/i) -      1.749k in   5.028395s
           oj strict    396.914 (± 3.0%) i/s    (2.52 ms/i) -      2.014k in   5.079645s
          Oj::Parser    409.311 (± 2.7%) i/s    (2.44 ms/i) -      2.064k in   5.046626s
           rapidjson    394.752 (± 1.5%) i/s    (2.53 ms/i) -      1.998k in   5.062776s

Comparison:
                json:      319.3 i/s
          Oj::Parser:      409.3 i/s - 1.28x  faster
           oj strict:      396.9 i/s - 1.24x  faster
           rapidjson:      394.8 i/s - 1.24x  faster
                  oj:      347.9 i/s - 1.09x  faster
```

https://github.com/ruby/json/commit/7e0f66546a
2024-11-01 13:04:24 +09:00
Jean Boussier
081689b9e2 [ruby/json] parser.rl: extract build_string
https://github.com/ruby/json/commit/7e557ee291
2024-11-01 13:04:24 +09:00
Benoit Daloze
6412e6f6c3 [ruby/json] Use String#encode instead of rb_str_conv_enc()
* rb_str_conv_enc() returns the source string unmodified
  if the conversion did not work. But we should be consistent with
  the generator here and only accept BINARY or convertible to UTF-8.

https://github.com/ruby/json/commit/1344ad6f66
2024-11-01 13:04:24 +09:00
Jean Boussier
3782600f0f [ruby/json] Emit warnings when dumping binary strings
Because of it's Ruby 1.8 heritage, the C extension doesn't care
much about strings encoding. We should get stricter over time.

https://github.com/ruby/json/commit/42402fc13f
2024-11-01 13:04:24 +09:00
Jean Boussier
f2b8829df0 Deprecate unsafe default options of JSON.load
[Feature #19528]

Ref: https://bugs.ruby-lang.org/issues/19528

`load` is understood as the default method for serializer kind of libraries, and
the default options of `JSON.load` has caused many security vulnerabilities over the
years.

The plan is to do like YAML/Psych, deprecate these default options and direct
users toward using `JSON.unsafe_load` so at least it's obvious it should be
used against untrusted data.
2024-11-01 13:04:24 +09:00
Jean Boussier
59eebeca02 [ruby/json] Allocate the initial generator buffer on the stack
Ref: https://github.com/ruby/json/issues/655
Followup: https://github.com/ruby/json/issues/657

Assuming the generator might be used for fairly small documents
we can start with a reasonable buffer size of the stack, and if
we outgrow it, we can spill on the heap.

In a way this is optimizing for micro-benchmarks, but there are
valid use case for fiarly small JSON document in actual real world
scenarios, so trashing the GC less in such case make sense.

Before:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                  Oj   518.700k i/100ms
          JSON reuse   483.370k i/100ms
Calculating -------------------------------------
                  Oj      5.722M (± 1.8%) i/s  (174.76 ns/i) -     29.047M in   5.077823s
          JSON reuse      5.278M (± 1.5%) i/s  (189.46 ns/i) -     26.585M in   5.038172s

Comparison:
                  Oj:  5722283.8 i/s
          JSON reuse:  5278061.7 i/s - 1.08x  slower
```

After:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                  Oj   517.837k i/100ms
          JSON reuse   548.871k i/100ms
Calculating -------------------------------------
                  Oj      5.693M (± 1.6%) i/s  (175.65 ns/i) -     28.481M in   5.004056s
          JSON reuse      5.855M (± 1.2%) i/s  (170.80 ns/i) -     29.639M in   5.063004s

Comparison:
                  Oj:  5692985.6 i/s
          JSON reuse:  5854857.9 i/s - 1.03x  faster
```

https://github.com/ruby/json/commit/fe607f4806
2024-11-01 13:04:24 +09:00
Peter Zhu
e077be119b [ruby/json] Remove double semicolon at end of line in parser
https://github.com/ruby/json/commit/f6d6ca3c17
2024-10-30 10:13:49 +09:00
Jean Boussier
5d176436ce [ruby/json] Allocate the FBuffer struct on the stack
Ref: https://github.com/ruby/json/issues/655

The actual buffer is still on the heap, but this saves a pair
of malloc/free.

This helps a lot on micro-benchmarks

Before:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                  Oj   531.598k i/100ms
          JSON reuse   417.666k i/100ms
Calculating -------------------------------------
                  Oj      5.735M (± 1.3%) i/s  (174.35 ns/i) -     28.706M in   5.005900s
          JSON reuse      4.604M (± 1.4%) i/s  (217.18 ns/i) -     23.389M in   5.080779s

Comparison:
                  Oj:  5735475.6 i/s
          JSON reuse:  4604380.3 i/s - 1.25x  slower
```

After:

```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                  Oj   518.700k i/100ms
          JSON reuse   483.370k i/100ms
Calculating -------------------------------------
                  Oj      5.722M (± 1.8%) i/s  (174.76 ns/i) -     29.047M in   5.077823s
          JSON reuse      5.278M (± 1.5%) i/s  (189.46 ns/i) -     26.585M in   5.038172s

Comparison:
                  Oj:  5722283.8 i/s
          JSON reuse:  5278061.7 i/s - 1.08x  slower
```

Bench:

```ruby
require 'benchmark/ips'
require 'oj'
require 'json'

json_encoder = JSON::State.new(JSON.dump_default_options)
test_data = [1, "string", { a: 1, b: 2 }, [3, 4, 5]]

Oj.default_options = Oj.default_options.merge(mode: :compat)

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)

  x.report("Oj") do
    Oj.dump(test_data)
  end

  x.report("JSON reuse") do
    json_encoder.generate(test_data)
  end

  x.compare!(order: :baseline)
end
```

https://github.com/ruby/json/commit/72110f7992
2024-10-30 10:13:48 +09:00
Jean Boussier
fc9f0cb8c5 [ruby/json] JSON.dump / String#to_json: raise on invalid encoding
This regressed since 2.7.2.

https://github.com/ruby/json/commit/35407d6635
2024-10-26 18:44:15 +09:00
Jean Boussier
70f554efb4 [ruby/json] raise_parse_error: avoid UB
Fix: https://github.com/ruby/json/pull/625

Declaring the buffer in a sub block cause bugs on some compilers.

https://github.com/ruby/json/commit/90967c9eb0
2024-10-26 18:44:15 +09:00
Jean Boussier
07fc21cfad [ruby/json] Ext::Parser avoid costly check on decimal_class when it is nil
Closes: https://github.com/ruby/json/pull/512

https://github.com/ruby/json/commit/d882a45d82

Co-Authored-By: lukeg <luke.gru@gmail.com>
2024-10-26 18:44:15 +09:00
Jean Boussier
9045258c88 [ruby/json] Limit the size of ParserError exception messages
Fix: https://github.com/ruby/json/issues/534

Only include up to 32 bytes of unparseable the source.

https://github.com/ruby/json/commit/f44995cfb6
2024-10-26 18:44:15 +09:00
Jean Boussier
7dfc1f3d66 [ruby/json] parser.c: refactor raise_parse_error
https://github.com/ruby/json/commit/09e1df2643
2024-10-26 18:44:15 +09:00
Jean Boussier
618085f48d [ruby/json] Get rid of the remaining tabs.
https://github.com/ruby/json/commit/1a9af430d2
2024-10-26 18:44:15 +09:00
Jean Boussier
e0f8732023 Reduce allocations in parse and load argument handling
Avoid needless hash allocations and such that degrade performance
significantly on micro-benchmarks.
2024-10-26 18:44:15 +09:00