152 Commits

Author SHA1 Message Date
yui-knk
db476cc71c Introduce NODE_SYM to manage symbol literal
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that

1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
2024-01-09 16:07:19 +09:00
yui-knk
7ffff3e043 Change numeric node value functions argument to NODE *
Change the argument to align with other node value functions
like `rb_node_line_lineno_val`.
2024-01-08 14:02:48 +09:00
S-H-GAMELINKS
1b8d01136c Introduce Numeric Node's 2024-01-07 09:24:34 +09:00
yui-knk
7a050638b1 Introduce NODE_FILE
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that

1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
2024-01-02 14:19:42 +09:00
yui-knk
1ade170a6c Introduce NODE_LINE
`__LINE__` was managed by `NODE_LIT` with `Integer` object.
This commit introduces `NODE_LINE` so that

1. `__LINE__` is detectable from AST Node
2. Reduce dependency ruby object
2023-12-29 18:32:27 +09:00
Nobuyoshi Nakada
45eee0cd94
Remove duplicate to_path conversion
`rb_file_open_str` calls `FilePathValue`, and the converted result is
not used in this function.
2023-11-02 10:06:03 +09:00
Nobuyoshi Nakada
13c9cbe09e
Embed rb_args_info in rb_node_args_t 2023-10-30 00:19:43 +09:00
yui-knk
08e25985d1 Expand OP_ASGN1 nd_args to nd_index and nd_rvalue
ARGSCAT has been used for nd_args to hold index and rvalue,
because there was limitation on the number of members for Node.
We can easily change structure of node now, let's expand it.
2023-10-20 07:56:20 +09:00
yui-knk
3049b5e348 Differentiate VAR nodes 2023-10-09 13:33:36 +09:00
yui-knk
09b33ea15a Differentiate CALL nodes 2023-10-09 13:33:36 +09:00
yui-knk
529a651f82 Differentiate ASGN nodes 2023-10-07 17:54:35 +09:00
yui-knk
f28d380374 Pass nd_value to NODE_REQUIRED_KEYWORD_P 2023-10-07 17:54:35 +09:00
Nobuyoshi Nakada
a5cc6341c0
Remove NODE_VALUES
This node type was added for the multi-value experiment back in 2004.
The feature itself was removed after a few years, but this is its
remnant.
2023-10-06 03:39:58 +09:00
Nobuyoshi Nakada
696022a0cb Differentiate NODE_BREAK/NODE_NEXT/NODE_RETURN 2023-10-05 14:23:42 +09:00
Nobuyoshi Nakada
70e1635950 Move internal NODE_DEF_TEMP to parse.y 2023-10-05 14:23:42 +09:00
yui-knk
08239fd6af Use rb_node_args_t and rb_node_args_aux_t instead of NODE 2023-10-01 19:38:03 +09:00
yui-knk
cecd1de2eb Use rb_node_opt_arg_t and rb_node_kw_arg_t instead of NODE 2023-10-01 09:19:42 +09:00
yui-knk
d293d9e191 Expand pattern_info struct into ARYPTN Node and FNDPTN Node 2023-09-30 13:11:32 +09:00
yui-knk
68ae87546e Merge NODE_DEF_TEMP and NODE_DEF_TEMP2 2023-09-29 19:36:34 +09:00
yui-knk
37a783a30c Merge RNode_OP_ASGN2 and RNode_OP_ASGN22 2023-09-29 08:36:39 +09:00
yui-knk
74c6781153 Change RNode structure from union to struct
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.

1. Low flexibility of data structure

Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.

2. No compile time check for union member access

It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.

This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
2023-09-28 11:58:10 +09:00
Nobuyoshi Nakada
6aa16f9ec1 Move SCRIPT_LINES__ away from parse.y 2023-08-25 18:23:05 +09:00
yui-knk
b481b673d7 [Feature #19719] Universal Parser
Introduce Universal Parser mode for the parser.
This commit includes these changes:

* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
  are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
2023-06-12 18:23:48 +09:00
yui-knk
5f65e8c5d5 Rename rb_node_name to the original name
98637d421dbe8bcf86cc2effae5e26bb96a6a4da changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
2023-05-24 20:54:48 +09:00
yui-knk
98637d421d Move ruby_node_name to node.c and rename prefix of the function 2023-05-23 18:05:35 +09:00
Nobuyoshi Nakada
2490b2e121 Add utility macros DECIMAL_SIZE_OF and DECIMAL_SIZE_OF_BYTES 2023-02-14 15:18:21 +09:00
yui-knk
979dd02e2f Check if the argument is Thread::Backtrace::Location object
[Bug #19262]
2023-01-06 09:22:09 +09:00
yui-knk
d8601621ed Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = <<~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
2022-11-21 09:01:34 +09:00
eileencodes
3391c51eff Add node_id_for_backtrace_location function
We want to use error highlight with eval'd code, specifically ERB
templates. We're able to recover the generated code for eval'd templates
and can get a parse tree for the ERB generated code, but we don't have a
way to get the node id from the backtrace location. So we can't pass the
right node into error highlight.

This patch gives us an API to get the node id from the backtrace
location so we can find the node in the AST.

Error Highlight PR: https://github.com/ruby/error_highlight/pull/26

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2022-10-31 13:39:56 +09:00
yui-knk
4bfdf6d06d Move error from top_stmts and top_stmt to stmt
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.

```
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

[Feature #19013]
2022-10-08 17:59:11 +09:00
yui-knk
fbbdbdd891 Add error_tolerant option to RubyVM::AST
If this option is enabled, SyntaxError is not raised and Node is
returned even if passed script is broken.

[Feature #19013]
2022-10-08 17:59:11 +09:00
Peter Zhu
5f10bd634f Add ISEQ_BODY macro
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
2022-03-24 10:03:51 -04:00
Yusuke Endoh
0dc7816c43 Make RubyVM::AST.of work with code written in -e command-line option
[Bug #18434]
2021-12-26 20:57:34 +09:00
Yusuke Endoh
6a51c3e80c Make AST.of possible even under eval when keep_script_lines is enabled
Now the following code works without an exception.

```
RubyVM.keep_script_lines = true

eval(<<END)
def foo
end
END

p RubyVM::AbstractSyntaxTree.of(method(:foo))
```
2021-12-19 04:00:51 +09:00
Yusuke Endoh
acac2b8128 Make RubyVM::AbstractSyntaxTree.of raise for backtrace location in eval
This check is needed to fix a bug of error_highlight when NameError
occurred in eval'ed code.
https://github.com/ruby/error_highlight/pull/16

The same check for proc/method has been already introduced since
64ac984129a7a4645efe5ac57c168ef880b479b2.
2021-12-19 03:51:37 +09:00
Nobuyoshi Nakada
54f0e63a8c Remove NODE_DASGN_CURR [Feature #18406]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope.  It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
2021-12-13 12:53:03 +09:00
S.H
ec7f14d9fa
Add nd_type_p macro 2021-12-04 00:01:24 +09:00
Yusuke Endoh
feda058531 Refactor hacky ID tables to struct rb_ast_id_table_t
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.

This change converts the hacky implementation to a normal struct.
2021-11-21 08:59:24 +09:00
Yusuke Endoh
09fa773e04
ast.c: Use kept script_lines data instead of re-opening the source file (#5019)
ast.c: Use kept script_lines data instead of re-open the source file
2021-10-26 01:58:01 +09:00
Yusuke Endoh
1b300789ff ast.c: AST.of against C method should return nil (as Ruby 2.6--3.0) 2021-09-18 21:52:18 +09:00
Yusuke Endoh
ed9d9cee76 ast.c: AST.of checks if a given method object is defined in C
[Bug #18178]
2021-09-18 21:28:35 +09:00
S-H-GAMELINKS
bdd6d8746f Replace RBOOL macro 2021-09-05 23:01:27 +09:00
Yusuke Endoh
cad83fa3c4 ast.c: Rename "save_script_lines" to "keep_script_lines"
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
2021-08-20 16:18:36 +09:00
Jeremy Evans
64ac984129 Make RubyVM::AbstractSyntaxTree.of raise for method/proc created in eval
This changes Thread::Location::Backtrace#absolute_path to return
nil for methods/procs defined in eval.  If the realpath of an iseq
is nil, that indicates it was defined in eval, in which case you
cannot use RubyVM::AbstractSyntaxTree.of.

Fixes [Bug #16983]

Co-authored-by: Koichi Sasada <ko1@atdot.net>
2021-07-29 13:51:03 -07:00
Yusuke Endoh
ed8e265d4b Experimentally expose RubyVM::AST::Node#node_id
Now ISeq#to_a includes the node_id list for each bytecode instruction.
I want a way to retrieve the AST::Node instance corresponding to an
instruction for a research purpose including TypeProf-based LSP server.
2021-06-21 21:15:25 +09:00
Yusuke Endoh
0a36cab1b5 Enable USE_ISEQ_NODE_ID by default
... which is formally called EXPERIMENTAL_ISEQ_NODE_ID.

See also ff69ef27b06eed1ba750e7d9cab8322f351ed245.

https://bugs.ruby-lang.org/issues/17930
2021-06-18 03:35:38 +09:00
Yusuke Endoh
dfba87cd62 Make it possible to get AST::Node from Thread::Backtrace::Location
RubyVM::AST.of(Thread::Backtrace::Location) returns a node that
corresponds to the location. Typically, the node is a method call, but
not always.

This change also includes iseq's dump/load support of node_ids for each
instructions.
2021-06-18 03:35:38 +09:00
Yusuke Endoh
fb01411ae8 node.h: Reduce struct size to fit with Ruby object size (five VALUEs)
by merging `rb_ast_body_t#line_count` and `#script_lines`.

Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
2021-06-18 02:34:27 +09:00
Yusuke Endoh
acae5f363d ast.rb: RubyVM::AST.parse and .of accepts save_script_lines: true
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
2021-06-18 02:34:27 +09:00
Yusuke Endoh
ff69ef27b0 compile.c: Pass node instead of nd_line(node) to ADD_INSN* functions
... then, new_insn_core extracts nd_line(node).

Also, if a macro "EXPERIMENTAL_ISEQ_NODE_ID" is defined, this changeset
keeps nd_node_id(node) for each instruction. This is intended for
TypeProf to identify what AST::Node corresponds to each instruction.

This patch is originally authored by @yui-knk for showing which column a
NoMethodError occurred.

https://github.com/ruby/ruby/compare/master...yui-knk:feature/node_id

Co-Authored-By: Yuichiro Kaneko <yui-knk@ruby-lang.org>
2021-05-07 17:02:15 +09:00