RubyNode consists of two parts: a C extension, which provides the core functionality, and a Ruby library, which adds additional functionality.
To get only the C extension require "rubynode_ext"
, to get the full
functionality (recommended) just require "rubynode"
.
Sections: Introduction to Ruby NODEs, Accessing the fields using RubyNode, Aliases for better readability, Getting RubyNodes, Higher level methods, More examples.
Ruby mainly uses the NODEs to store the AST (abstract syntax tree) of parsed Ruby files, but they are also used for some other things. On the C side NODE is a struct:
typedef struct RNode {
unsigned long flags;
char *nd_file;
union {
struct RNode *node;
ID id;
VALUE value;
VALUE (*cfunc)(ANYARGS);
ID *tbl;
} u1;
union {
struct RNode *node;
ID id;
long argc;
VALUE value;
} u2;
union {
struct RNode *node;
ID id;
long state;
struct global_entry *entry;
long cnt;
VALUE value;
} u3;
} NODE;
The important parts are the 3 unions u1
, u2
and u3
, which can store
values of different types. There is also a nd_file
field, which contains the
name of the file from which this node was parsed and there is a flags
field,
which contains some flags, the node type and the line number (all ORed
together).
The node types are defined as an enum in C:
enum node_type {
NODE_METHOD,
NODE_FBODY,
NODE_CFUNC,
NODE_SCOPE,
NODE_BLOCK,
...
If you have an instance of RubyNode, you can access all these fields using the following methods:
file |
the contents of nd_file as String |
flags |
the raw flags field as Integer |
line |
the line number (extracted from flags ) as Integer |
type |
the node type as Symbol (e.g. NODE_SCOPE => :scope ) |
The unions can be accessed using the following methods:
u1_as_long
u1_cfunc
u1_id
u1_node
u1_tbl
u1_value
u2_argc
u2_id
u2_node
u2_value
u3_cnt
u3_id
u3_node
u3_state
u3_value
The *_id
methods return a Symbol, if the value of this union is a valid ID.
If the value is 0
or 1
they will return the corresponding Fixnum (this is
because some node types store 0
and 1
instead of an ID for special cases).
Otherwise nil
is returned.
The *_value
and *_node
methods return the object or the node (wrapped as
RubyNode), that is stored in the union. If the union is no object or node,
then nil
is returned. The *_value and *_node methods can be used
interchangeably, they are just aliases.
u1_as_long
, u2_argc
, u3_cnt
and u3_state
always return the raw “long”,
that is stored in the union, as Integer.
u1_cfunc
returns the unsigned value of the function pointer, that is stored
in the union, as Integer.
u1_tbl
returns the local variable table for a scope node as an Array, for
all other node types it returns nil
.
All these methods never raise exceptions, they just return nil
, if the
value that is stored in the union does not have the requested type.
u3.entry
is not supported (for obvious reasons).
It would not be very readable to always access the unions directly, so Ruby uses defines on the C side to remedy this:
#define nd_head u1.node
#define nd_alen u2.argc
#define nd_next u3.node
#define nd_cond u1.node
#define nd_body u2.node
#define nd_else u3.node
...
RubyNode also makes these aliases available. To get a full list of the available aliases for your Ruby version you can use something like the following in irb:
>> puts RubyNode.instance_methods.grep(/^nd_/).sort
nd_1st
nd_2nd
nd_aid
nd_alen
nd_argc
nd_args
nd_beg
nd_body
...
Now that we know how to work with RubyNode instances, it would be nice to have some to try it out ;-). There is no way to instantiate RubyNode instances yourself, you can only get them through one of the following methods.
You can access the body node of methods by using the body_node
method.
Procs have three different nodes, the body node, which contains the actual
code, the var node, which describes the arguments of the proc and the cref
node, which is the lexical class/module nesting in which this proc was defined.
Those nodes can be accessed using the methods body_node
, var_node
and
cref_node
.
RubyNode adds a parse_to_nodes
method to String, which will parse the given
string using Ruby’s parser. The parsing will be done in the current
context/binding/scope, so it basically returns the AST that eval would see,
given this string.
For Ruby 1.8.x there also is the method parse_begin_to_nodes
, which returns
the AST for all BEGIN
blocks in the string. Those BEGIN
blocks won’t be in
the AST returned by parse_to_nodes
. In Ruby 1.9 parse_to_nodes
returns a
combined AST. parse_to_nodes
and parse_begin_to_nodes
also accept two
optional arguments: file and line. The arguments work similar to those of
Ruby’s eval
and default to "(string)"
and 1
.
For more details please see the examples below.
All the functionality described above is provided by the C extension. The
following methods are only available if you require "rubynode"
(instead of
just "rubynode_ext"
).
It would be a bit tedious to only work with the above methods to access the
attributes of nodes. You would have to know which node type has what
attributes for example. Fortunately RubyNode provides a nicer way: the method
attribs_hash
. This method returns a hash that contains all attributes of the
node. Example:
>> n = "1 + 2".parse_to_nodes.nd_next
=> #<RubyNode :call>
>> n.attribs_hash
=> {:mid=>:+, :recv=>#<RubyNode :lit>, :args=>#<RubyNode :array>}
This is nice but it is still a bit tedious, because you would then probably do something like the following:
>> n.attribs_hash[:recv].attribs_hash
=> {:lit=>1}
>> n.attribs_hash[:args].attribs_hash
=> {:next=>false, :head=>#<RubyNode :lit>, :alen=>1}
>> n.attribs_hash[:args].attribs_hash[:head].attribs_hash
=> {:lit=>2}
So, there is an even better way: the method transform
. It is basically a
recursive version of attribs_hash
, it transforms a node tree into a tree of
arrays and hashes. Example:
>> n.transform
=> [:call, {:mid=>:+, :recv=>[:lit, {:lit=>1}], :args=>[:array, [[:lit, {:lit=>2}]]]}]
So the #<RubyNode :call>
became [:call, "attribs_hash applied recursively"]
.
You might have noticed, that the :array
node doesn’t have a hash as second
element in its array, this is some special magic to make it easier to work
with :array
nodes. If you really want to see the node tree of :array
nodes, you can get that, too:
>> n.nd_args.transform
=> [:array, [[:lit, {:lit=>2}]]]
>> n.nd_args.transform(:keep_array_nodes => true)
=> [:array, {:next=>false, :head=>[:lit, {:lit=>2}], :alen=>1}]
The same magic is also done for :block
nodes:
>> bl = "foo; bar".parse_to_nodes
=> #<RubyNode :block>
>> bl.transform
=> [:block, [[:vcall, {:mid=>:foo}], [:vcall, {:mid=>:bar}]]]
>> pp bl.transform(:keep_block_nodes => true)
[:block,
{:next=>[:block, {:next=>false, :head=>[:vcall, {:mid=>:bar}]}],
:head=>[:vcall, {:mid=>:foo}]}]
transform
also strips :newline
nodes (only useful for Ruby 1.8, 1.9
doesn’t have :newline
nodes), but if you really want those, you can get
them:
>> pp bl.transform
[:block, [[:vcall, {:mid=>:foo}], [:vcall, {:mid=>:bar}]]]
=> nil
>> pp bl.transform(:keep_newline_nodes => true)
[:block,
[[:newline, {:next=>[:vcall, {:mid=>:foo}]}],
[:newline, {:next=>[:vcall, {:mid=>:bar}]}]]]
And finally transform
can also include the original RubyNode instance in the
hash, if you later need access to the filename, line number or the flags:
>> pp bl.transform(:include_node => true)
[:block,
[[:vcall, {:mid=>:foo, :node=>#<RubyNode :vcall>}],
[:vcall, {:mid=>:bar, :node=>#<RubyNode :vcall>}]]]
=> nil
>> pp n.transform(:include_node => true)
[:call,
{:mid=>:+,
:recv=>[:lit, {:lit=>1, :node=>#<RubyNode :lit>}],
:args=>[:array, [[:lit, {:lit=>2, :node=>#<RubyNode :lit>}]]],
:node=>#<RubyNode :call>}]
The options :keep_array_nodes
, :keep_block_nodes
, :keep_newline_nodes
and :include_node
can also be combined.
>> class A
>> def foo(x)
>> @bar + x
>> end
>> end
=> nil
>> pp A.instance_method(:foo).body_node.transform
[:scope,
{:next=>
[:block,
[[:args, {:rest=>-1, :cnt=>1, :opt=>false}],
[:call,
{:mid=>:+,
:recv=>[:ivar, {:vid=>:@bar}],
:args=>[:array, [[:lvar, {:cnt=>2, :vid=>:x}]]]}]]],
:rval=>[:cref, {:next=>[:cref, {:next=>false, :clss=>Object}], :clss=>A}],
:tbl=>[:x]}]
=> nil
>> pp A.new.method(:foo).body_node.transform
[:scope,
{:next=>
[:block,
[[:args, {:rest=>-1, :cnt=>1, :opt=>false}],
[:call,
{:mid=>:+,
:recv=>[:ivar, {:vid=>:@bar}],
:args=>[:array, [[:lvar, {:cnt=>2, :vid=>:x}]]]}]]],
:rval=>[:cref, {:next=>[:cref, {:next=>false, :clss=>Object}], :clss=>A}],
:tbl=>[:x]}]
>> add_23 = proc { |x| x + 23 }
=> #<Proc:0xb7edafd8@(irb):9>
>> add_23.body_node.transform
=> [:call, {:mid=>:+, :recv=>[:dvar, {:vid=>:x}], :args=>[:array, [[:lit, {:lit=>23}]]]}]
>> add_23.var_node.transform
=> [:dasgn_curr, {:value=>false, :vid=>:x}]
>> add_23.cref_node.transform
=> [:cref, {:next=>false, :clss=>Object}]
As mentioned above, the parsing is done in the current context, so the result can differ depending on local variables:
>> defined? z
=> nil
>> "z".parse_to_nodes.transform
=> [:vcall, {:mid=>:z}]
>> z = 42
=> 42
>> defined? z
=> "local-variable"
>> "z".parse_to_nodes.transform
=> [:lvar, {:cnt=>4, :vid=>:z}]
BEGIN
blocksRuby 1.8:
>> "BEGIN { p 1 }; p 2".parse_to_nodes.transform
=> [:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>2}]]]}]
>> pp "BEGIN { p 1 }; p 2".parse_begin_to_nodes.transform
[:scope,
{:next=>[:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>1}]]]}],
:rval=>false,
:tbl=>nil}]
Ruby 1.9:
>> pp "BEGIN { p 1 }; p 2".parse_to_nodes.transform
[:prelude,
{:head=>
[:scope,
{:rval=>false,
:tbl=>nil,
:next=>[:fcall, {:args=>[:array, [[:lit, {:lit=>1}]]], :mid=>:p}]}],
:body=>[:fcall, {:args=>[:array, [[:lit, {:lit=>2}]]], :mid=>:p}]}]