213 Commits

Author SHA1 Message Date
Nobuyoshi Nakada
d44a41d814
Add rb_node_get_type for debuggers 2025-01-09 19:24:27 +09:00
Yusuke Endoh
a32981b6b8 Remove a useless check
Here `nb` should never be NULL. If it were, the following
`nb->buffer_list` would be strange.

A follow-up to ddd8da4b6ba3dfcca21ca710e7cef2fa3b9632d7
2024-11-29 12:27:04 +09:00
yui-knk
6be539aab5 Change UNDEF Node structure
Change UNDEF Node to hold their items to keep the original grammar
structure.

For example:

```
undef a, b
```

Before:

```
@ NODE_BLOCK (id: 4, line: 1, location: (1,6)-(1,10))*
+- nd_head (1):
|   @ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,7))
|   +- nd_undef:
|       @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
|       +- string: :a
+- nd_head (2):
    @ NODE_UNDEF (id: 3, line: 1, location: (1,9)-(1,10))
    +- nd_undef:
        @ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
        +- string: :b
```

After:

```
@ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,10))*
+- nd_undefs:
    +- length: 2
    +- element (0):
    |   @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
    |   +- string: :a
    +- element (1):
        @ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
        +- string: :b
```
2024-07-20 11:25:26 +09:00
yui-knk
4c41203bbb Remove needless header file include 2024-04-29 21:26:44 +09:00
yui-knk
29aaf4abe6 Remove ast_new field from struct rb_parser_config_struct
`ast_new` can be embedded into `rb_ast_new`.
2024-04-28 17:58:57 +09:00
HASUMI Hitoshi
ddd8da4b6b [Universal parser] Improve AST structure
This patch moves `ast->node_buffer->config` to `ast->config` aiming to improve readability and maintainability of the source.

## Background

We could not add the `config` field to the `rb_ast_t *` due to the five-word restriction of the IMEMO object.
But it is now doable by merging https://github.com/ruby/ruby/pull/10618

## About assigning `&rb_global_parser_config` to `ast->config` in `ast_alloc()`

The approach of not setting `ast->config` in `ast_alloc()` means that the client, CRuby in this scenario, that directly calls `ast_alloc()` will be responsible for releasing it if a resource that is passed to AST needs to be released.

However, we have put on hold whether we can guarantee the above so far, thus, this patch looks like that.

```
// ruby_parser.c
static VALUE
ast_alloc(void)
{
    rb_ast_t *ast;
    VALUE vast = TypedData_Make_Struct(0, rb_ast_t, &ast_data_type, ast);
#ifdef UNIVERSAL_PARSER
    ast = (rb_ast_t *)DATA_PTR(vast);
    ast->config = &rb_global_parser_config;
#endif
    return vast;
}
```
2024-04-28 12:08:21 +09:00
HASUMI Hitoshi
55a402bb75 Add line_count field to rb_ast_body_t
This patch adds `int line_count` field to `rb_ast_body_t` structure.
Instead, we no longer cast `script_lines` to Fixnum.

## Background

Ref https://github.com/ruby/ruby/pull/10618

In the PR above, we have decoupled IMEMO from `rb_ast_t`.
This means we could lift the five-words-restriction of the structure
that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field.

## Relating refactor

- Remove the second parameter of `rb_ruby_ast_new()` function

## Attention

I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()`
of vm.c, because I don't think it is necessary.
But I will make another PR for this so that we can atomically revert
in case I was wrong (See the comment on the code)
2024-04-27 12:08:26 +09:00
HASUMI Hitoshi
2244c58b00 [Universal parser] Decouple IMEMO from rb_ast_t
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.

## Background

We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.

## Summary (file by file)

- `rubyparser.h`
  - Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
  - Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
    - You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
  - Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
  - rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
  - Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
  - This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
  - Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
  - Fix `node_memsize()`
    - Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
  - Follow-up due to the above changes
- `imemo.{c|h}`
  - If an object with `imemo_ast` appears, considers it a bug

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-04-26 11:21:08 +09:00
yui-knk
f87c216c94 Remove needless header file include 2024-04-21 15:50:45 +09:00
yui-knk
ebaa87f626 Remove unused functions from struct rb_parser_config_struct 2024-04-16 07:15:45 +09:00
HASUMI Hitoshi
9b1e97b211 [Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines
This patch is part of universal parser work.

## Summary
- Decouple VALUE from members below:
  - `(struct parser_params *)->debug_lines`
  - `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
  - They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
  - In order to do this,
  - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
  - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`

## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
  - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
  - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
  - With regard to this, please see *Future tasks* below

## Future tasks
- Decouple IMEMO from `rb_ast_t *`
  - This lifts the five-members-restriction of Ruby object,
  - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
  - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-15 20:51:54 +09:00
yui-knk
6f7e8e278f Don't set T_TYPES of NODE
T_TYPES was needed once Ripper jumbled NODEs and other type
objects. However such hack was already removed.
Therefore don't need to set T_TYPES of NODE.
2024-04-08 11:39:00 +09:00
yui-knk
cebbe18eed Remove needless check
`nodetype_markable_p` always returns `false` then
`rb_ast_node_type_change` never calls `rb_bug`.
2024-04-05 09:19:57 +09:00
yui-knk
fc8fe78c07 Merge two node_buffer_list_t fields into one
All types of Node are managed by `node_buffer_list_t unmarkable`
therefore merge them into `node_buffer_list_t buffer_list`.
2024-04-05 09:19:57 +09:00
yui-knk
39afab6083 Remove unused macros from node.c 2024-04-05 08:22:58 +09:00
yui-knk
f057741c5d NODE_LIT is not used anymore 2024-04-04 13:17:26 +09:00
HASUMI Hitoshi
9a19cfd4cd [Universal Parser] Reduce dependence on RArray in parse.y
- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y
  - In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *`
  - Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t`
  - Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c

- Implement `rb_parser_str_escape()`
  - This is a port of the `rb_str_escape()` function in string.c
  - `rb_parser_str_escape()` does not depend on `VALUE` (RString)
  - Instead, it uses `rb_parser_stirng_t *`
  - This function works when --dump=y option passed

- Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future

- Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)
2024-03-12 17:17:52 +09:00
Peter Zhu
330830dd1a Add IMEMO_NEW
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
2024-02-21 11:33:05 -05:00
yui-knk
e7ab5d891c Introduce NODE_REGX to manage regexp literal 2024-02-21 08:06:48 +09:00
Peter Zhu
c184aa8740 Use rb_gc_mark_and_move for imemo 2024-02-20 10:39:30 -05:00
yui-knk
89cfc15207 [Feature #20257] Rearchitect Ripper
Introduce another semantic value stack for Ripper so that
Ripper can manage both Node and Ruby Object separately.
This rearchitectutre of Ripper solves these issues.
Therefore adding test cases for them.

* [Bug 10436] https://bugs.ruby-lang.org/issues/10436
* [Bug 18988] https://bugs.ruby-lang.org/issues/18988
* [Bug 20055] https://bugs.ruby-lang.org/issues/20055

Checked the differences of `Ripper.sexp` for files under `/test/ruby`
are only on test_pattern_matching.rb.
The differences comes from the differences between
`new_hash_pattern_tail` functions between parser and Ripper.
Ripper `new_hash_pattern_tail` didn’t call `assignable` then
`kw_rest_arg` wasn’t marked as local variable.
This is also fixed by this commit.

```
--- a/./tmp/before/test_pattern_matching.rb
+++ b/./tmp/after/test_pattern_matching.rb
@@ -3607,7 +3607,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [984, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [985, 10]]],
+                    [:var_ref, [:@ident, “a”, [985, 10]]],
                     :==,
                     [:hash, nil]]],
                   nil]]],
@@ -3662,7 +3662,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [993, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [994, 10]]],
+                    [:var_ref, [:@ident, “a”, [994, 10]]],
                     :==,
                     [:hash,
                      [:assoclist_from_args,
@@ -3813,7 +3813,7 @@
                    [:command,
                     [:@ident, “raise”, [1022, 10]],
                     [:args_add_block,
-                     [[:vcall, [:@ident, “b”, [1022, 16]]]],
+                     [[:var_ref, [:@ident, “b”, [1022, 16]]]],
                      false]]],
                   [:else, [[:var_ref, [:@kw, “true”, [1024, 10]]]]]]]],
                nil,
@@ -3876,7 +3876,7 @@
                      [:@int, “0”, [1033, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1033, 20]]],
+                     [:var_ref, [:@ident, “b”, [1033, 20]]],
                      :==,
                      [:hash, nil]]]],
                   nil]]],
@@ -3946,7 +3946,7 @@
                      [:@int, “0”, [1042, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1042, 20]]],
+                     [:var_ref, [:@ident, “b”, [1042, 20]]],
                      :==,
                      [:hash,
                       [:assoclist_from_args,
@@ -5206,7 +5206,7 @@
                      [[:assoc_new,
                        [:@label, “c:“, [1352, 22]],
                        [:@int, “0”, [1352, 25]]]]]],
-                   [:vcall, [:@ident, “r”, [1352, 29]]]],
+                   [:var_ref, [:@ident, “r”, [1352, 29]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5299,7 +5299,7 @@
                       [:assoc_new,
                        [:@label, “c:“, [1367, 34]],
                        [:@int, “0”, [1367, 37]]]]]],
-                   [:vcall, [:@ident, “r”, [1367, 41]]]],
+                   [:var_ref, [:@ident, “r”, [1367, 41]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5931,7 +5931,7 @@
              [:in,
               [:hshptn, nil, [], [:var_field, [:@ident, “r”, [1533, 11]]]],
               [[:binary,
-                [:vcall, [:@ident, “r”, [1534, 8]]],
+                [:var_ref, [:@ident, “r”, [1534, 8]]],
                 :==,
                 [:hash,
                  [:assoclist_from_args,
```
2024-02-20 17:33:58 +09:00
Peter Zhu
c5f22b5b75 Make all fields in AST movable 2024-02-16 10:15:35 -05:00
yui-knk
33c1e082d0 Remove ruby object from string nodes
String nodes holds ruby string object on `VALUE nd_lit`.
This commit changes it to `struct rb_parser_string *string`
to reduce dependency on ruby object.
Sometimes these strings are concatenated with other string
therefore string concatenate functions are needed.
2024-02-09 14:20:17 +09:00
Nobuyoshi Nakada
0610f555ea
Constify rb_global_parser_config 2024-01-14 17:55:11 +09:00
yui-knk
b35e21b388 Remove reference counter from rb_parser_config
It's allocated outside of parser then no need to track
reference count in rb_parser_config.
2024-01-12 21:17:41 +09:00
yui-knk
52d9e55903 Statically allocate parser config 2024-01-12 21:17:41 +09:00
yui-knk
db476cc71c Introduce NODE_SYM to manage symbol literal
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that

1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
2024-01-09 16:07:19 +09:00
S-H-GAMELINKS
1b8d01136c Introduce Numeric Node's 2024-01-07 09:24:34 +09:00
yui-knk
7a050638b1 Introduce NODE_FILE
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that

1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
2024-01-02 14:19:42 +09:00
Nobuyoshi Nakada
13c9cbe09e
Embed rb_args_info in rb_node_args_t 2023-10-30 00:19:43 +09:00
Nobuyoshi Nakada
a405b28e85 Delete heredoc line mark references 2023-10-14 11:08:43 +09:00
yui-knk
d293d9e191 Expand pattern_info struct into ARYPTN Node and FNDPTN Node 2023-09-30 13:11:32 +09:00
Peter Zhu
97564ddf2b Fix memory leak in the parser
Reproduction script:

```
require "ripper"

10.times do
  20_000.times do
    Ripper.parse("")
  end

  puts `ps -o rss= -p #{$$}`
end
```

Before:

```
28032
34432
40704
47232
53632
60032
66432
72832
79232
85632
```

After:

```
21760
21760
21760
21760
21760
21760
21760
21760
21760
21760
```
2023-09-29 16:51:17 -04:00
yui-knk
74c6781153 Change RNode structure from union to struct
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.

1. Low flexibility of data structure

Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.

2. No compile time check for union member access

It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.

This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
2023-09-28 11:58:10 +09:00
yui-knk
fb7a2ddb4b Directly free structure managed by imemo tmpbuf
NODE_ARGS, NODE_ARYPTN, NODE_FNDPTN manage memory of their
structure by imemo tmpbuf Object.
However rb_ast_struct has reference to NODE. Then these
memory can be freed directly when rb_ast_struct is freed.

This commit reduces parser's dependency on CRuby functions.
2023-09-22 11:25:53 +09:00
yui-knk
19c62b400d Replace parser & node compile_option from Hash to bit field
This commit reduces dependency to CRuby object.
2023-06-17 16:41:08 +09:00
yui-knk
b481b673d7 [Feature #19719] Universal Parser
Introduce Universal Parser mode for the parser.
This commit includes these changes:

* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
  are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
2023-06-12 18:23:48 +09:00
yui-knk
5f65e8c5d5 Rename rb_node_name to the original name
98637d421dbe8bcf86cc2effae5e26bb96a6a4da changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
2023-05-24 20:54:48 +09:00
yui-knk
98637d421d Move ruby_node_name to node.c and rename prefix of the function 2023-05-23 18:05:35 +09:00
yui-knk
d8601621ed Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = <<~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
2022-11-21 09:01:34 +09:00
yui-knk
4bfdf6d06d Move error from top_stmts and top_stmt to stmt
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.

```
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

[Feature #19013]
2022-10-08 17:59:11 +09:00
Wolf
c69ad738dc Initialize node_id
In some causes node_id might have been left uninitialized leading to
undefined behavior on access. So always set it to -1, so we have *some*
valid value in there.
2022-08-01 10:36:36 +09:00
Takashi Kokubun
5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Nobuyoshi Nakada
54f0e63a8c Remove NODE_DASGN_CURR [Feature #18406]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope.  It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
2021-12-13 12:53:03 +09:00
S.H
ec7f14d9fa
Add nd_type_p macro 2021-12-04 00:01:24 +09:00
Yusuke Endoh
feda058531 Refactor hacky ID tables to struct rb_ast_id_table_t
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.

This change converts the hacky implementation to a normal struct.
2021-11-21 08:59:24 +09:00
Yusuke Endoh
753cfbdbf3 node.c (dump_node): update format explanation for NODE_ARGS 2021-11-17 23:38:52 +09:00
Yusuke Endoh
5a7b4dba26 node.c (dump_node): trivial refactoring 2021-11-17 23:38:19 +09:00
Nobuyoshi Nakada
6504ca006b
Show node IDs in dump 2021-07-12 12:10:16 +09:00
Yusuke Endoh
acae5f363d ast.rb: RubyVM::AST.parse and .of accepts save_script_lines: true
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
2021-06-18 02:34:27 +09:00