RubyNode API

RubyNode consists of two parts: a C extension, which provides the core functionality, and a Ruby library, which adds additional functionality.

To get only the C extension require "rubynode_ext", to get the full functionality (recommended) just require "rubynode".

Sections: Introduction to Ruby NODEs, Accessing the fields using RubyNode, Aliases for better readability, Getting RubyNodes, Higher level methods, More examples.

Introduction to Ruby NODEs

Ruby mainly uses the NODEs to store the AST (abstract syntax tree) of parsed Ruby files, but they are also used for some other things. On the C side NODE is a struct:

typedef struct RNode {
    unsigned long flags;
    char *nd_file;
    union {
        struct RNode *node;
        ID id;
        VALUE value;
        VALUE (*cfunc)(ANYARGS);
        ID *tbl;
    } u1;
    union {
        struct RNode *node;
        ID id;
        long argc;
        VALUE value;
    } u2;
    union {
        struct RNode *node;
        ID id;
        long state;
        struct global_entry *entry;
        long cnt;
        VALUE value;
    } u3;
} NODE;

The important parts are the 3 unions u1, u2 and u3, which can store values of different types. There is also a nd_file field, which contains the name of the file from which this node was parsed and there is a flags field, which contains some flags, the node type and the line number (all ORed together).

The node types are defined as an enum in C:

enum node_type {
    NODE_METHOD,
    NODE_FBODY,
    NODE_CFUNC,
    NODE_SCOPE,
    NODE_BLOCK,
...

Accessing the fields using RubyNode

If you have an instance of RubyNode, you can access all these fields using the following methods:

file the contents of nd_file as String
flags the raw flags field as Integer
line the line number (extracted from flags) as Integer
type the node type as Symbol (e.g. NODE_SCOPE => :scope)

The unions can be accessed using the following methods:

u1_as_long
u1_cfunc
u1_id
u1_node
u1_tbl
u1_value
u2_argc
u2_id
u2_node
u2_value
u3_cnt
u3_id
u3_node
u3_state
u3_value

The *_id methods return a Symbol, if the value of this union is a valid ID. If the value is 0 or 1 they will return the corresponding Fixnum (this is because some node types store 0 and 1 instead of an ID for special cases). Otherwise nil is returned.

The *_value and *_node methods return the object or the node (wrapped as RubyNode), that is stored in the union. If the union is no object or node, then nil is returned. The *_value and *_node methods can be used interchangeably, they are just aliases.

u1_as_long, u2_argc, u3_cnt and u3_state always return the raw “long”, that is stored in the union, as Integer.

u1_cfunc returns the unsigned value of the function pointer, that is stored in the union, as Integer.

u1_tbl returns the local variable table for a scope node as an Array, for all other node types it returns nil.

All these methods never raise exceptions, they just return nil, if the value that is stored in the union does not have the requested type.

u3.entry is not supported (for obvious reasons).

Aliases for better readability

It would not be very readable to always access the unions directly, so Ruby uses defines on the C side to remedy this:

#define nd_head  u1.node
#define nd_alen  u2.argc
#define nd_next  u3.node

#define nd_cond  u1.node
#define nd_body  u2.node
#define nd_else  u3.node
...

RubyNode also makes these aliases available. To get a full list of the available aliases for your Ruby version you can use something like the following in irb:

>> puts RubyNode.instance_methods.grep(/^nd_/).sort
nd_1st
nd_2nd
nd_aid
nd_alen
nd_argc
nd_args
nd_beg
nd_body
...

Getting RubyNodes

Now that we know how to work with RubyNode instances, it would be nice to have some to try it out ;-). There is no way to instantiate RubyNode instances yourself, you can only get them through one of the following methods.

(Unbound)Method

You can access the body node of methods by using the body_node method.

Proc

Procs have three different nodes, the body node, which contains the actual code, the var node, which describes the arguments of the proc and the cref node, which is the lexical class/module nesting in which this proc was defined. Those nodes can be accessed using the methods body_node, var_node and cref_node.

String

RubyNode adds a parse_to_nodes method to String, which will parse the given string using Ruby’s parser. The parsing will be done in the current context/binding/scope, so it basically returns the AST that eval would see, given this string.

For Ruby 1.8.x there also is the method parse_begin_to_nodes, which returns the AST for all BEGIN blocks in the string. Those BEGIN blocks won’t be in the AST returned by parse_to_nodes. In Ruby 1.9 parse_to_nodes returns a combined AST. parse_to_nodes and parse_begin_to_nodes also accept two optional arguments: file and line. The arguments work similar to those of Ruby’s eval and default to "(string)" and 1.

For more details please see the examples below.

Higher level methods

All the functionality described above is provided by the C extension. The following methods are only available if you require "rubynode" (instead of just "rubynode_ext").

It would be a bit tedious to only work with the above methods to access the attributes of nodes. You would have to know which node type has what attributes for example. Fortunately RubyNode provides a nicer way: the method attribs_hash. This method returns a hash that contains all attributes of the node. Example:

>> n = "1 + 2".parse_to_nodes.nd_next
=> #<RubyNode :call>
>> n.attribs_hash
=> {:mid=>:+, :recv=>#<RubyNode :lit>, :args=>#<RubyNode :array>}

This is nice but it is still a bit tedious, because you would then probably do something like the following:

>> n.attribs_hash[:recv].attribs_hash
=> {:lit=>1}
>> n.attribs_hash[:args].attribs_hash
=> {:next=>false, :head=>#<RubyNode :lit>, :alen=>1}
>> n.attribs_hash[:args].attribs_hash[:head].attribs_hash
=> {:lit=>2}

So, there is an even better way: the method transform. It is basically a recursive version of attribs_hash, it transforms a node tree into a tree of arrays and hashes. Example:

>> n.transform
=> [:call, {:mid=>:+, :recv=>[:lit, {:lit=>1}], :args=>[:array, [[:lit, {:lit=>2}]]]}]

So the #<RubyNode :call> became [:call, "attribs_hash applied recursively"]. You might have noticed, that the :array node doesn’t have a hash as second element in its array, this is some special magic to make it easier to work with :array nodes. If you really want to see the node tree of :array nodes, you can get that, too:

>> n.nd_args.transform
=> [:array, [[:lit, {:lit=>2}]]]
>> n.nd_args.transform(:keep_array_nodes => true)
=> [:array, {:next=>false, :head=>[:lit, {:lit=>2}], :alen=>1}]

The same magic is also done for :block nodes:

>> bl = "foo; bar".parse_to_nodes
=> #<RubyNode :block>
>> bl.transform
=> [:block, [[:vcall, {:mid=>:foo}], [:vcall, {:mid=>:bar}]]]
>> pp bl.transform(:keep_block_nodes => true)
[:block,
 {:next=>[:block, {:next=>false, :head=>[:vcall, {:mid=>:bar}]}],
  :head=>[:vcall, {:mid=>:foo}]}]

transform also strips :newline nodes (only useful for Ruby 1.8, 1.9 doesn’t have :newline nodes), but if you really want those, you can get them:

>> pp bl.transform
[:block, [[:vcall, {:mid=>:foo}], [:vcall, {:mid=>:bar}]]]
=> nil
>> pp bl.transform(:keep_newline_nodes => true)
[:block,
 [[:newline, {:next=>[:vcall, {:mid=>:foo}]}],
  [:newline, {:next=>[:vcall, {:mid=>:bar}]}]]]

And finally transform can also include the original RubyNode instance in the hash, if you later need access to the filename, line number or the flags:

>> pp bl.transform(:include_node => true)
[:block,
 [[:vcall, {:mid=>:foo, :node=>#<RubyNode :vcall>}],
  [:vcall, {:mid=>:bar, :node=>#<RubyNode :vcall>}]]]
=> nil
>> pp n.transform(:include_node => true)
[:call,
 {:mid=>:+,
  :recv=>[:lit, {:lit=>1, :node=>#<RubyNode :lit>}],
  :args=>[:array, [[:lit, {:lit=>2, :node=>#<RubyNode :lit>}]]],
  :node=>#<RubyNode :call>}]

The options :keep_array_nodes, :keep_block_nodes, :keep_newline_nodes and :include_node can also be combined.

More examples

(Unbound)Method

>> class A
>>   def foo(x)
>>     @bar + x
>>   end
>> end
=> nil
>> pp A.instance_method(:foo).body_node.transform
[:scope,
 {:next=>
   [:block,
    [[:args, {:rest=>-1, :cnt=>1, :opt=>false}],
     [:call,
      {:mid=>:+,
       :recv=>[:ivar, {:vid=>:@bar}],
       :args=>[:array, [[:lvar, {:cnt=>2, :vid=>:x}]]]}]]],
  :rval=>[:cref, {:next=>[:cref, {:next=>false, :clss=>Object}], :clss=>A}],
  :tbl=>[:x]}]
=> nil
>> pp A.new.method(:foo).body_node.transform
[:scope,
 {:next=>
   [:block,
    [[:args, {:rest=>-1, :cnt=>1, :opt=>false}],
     [:call,
      {:mid=>:+,
       :recv=>[:ivar, {:vid=>:@bar}],
       :args=>[:array, [[:lvar, {:cnt=>2, :vid=>:x}]]]}]]],
  :rval=>[:cref, {:next=>[:cref, {:next=>false, :clss=>Object}], :clss=>A}],
  :tbl=>[:x]}]

Proc

>> add_23 = proc { |x| x + 23 }
=> #<Proc:0xb7edafd8@(irb):9>
>> add_23.body_node.transform
=> [:call, {:mid=>:+, :recv=>[:dvar, {:vid=>:x}], :args=>[:array, [[:lit, {:lit=>23}]]]}]
>> add_23.var_node.transform
=> [:dasgn_curr, {:value=>false, :vid=>:x}]
>> add_23.cref_node.transform
=> [:cref, {:next=>false, :clss=>Object}]

Parsing strings

As mentioned above, the parsing is done in the current context, so the result can differ depending on local variables:

>> defined? z
=> nil
>> "z".parse_to_nodes.transform
=> [:vcall, {:mid=>:z}]
>> z = 42
=> 42
>> defined? z
=> "local-variable" 
>> "z".parse_to_nodes.transform
=> [:lvar, {:cnt=>4, :vid=>:z}]

BEGIN blocks

Ruby 1.8:

>> "BEGIN { p 1 }; p 2".parse_to_nodes.transform
=> [:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>2}]]]}]
>> pp "BEGIN { p 1 }; p 2".parse_begin_to_nodes.transform
[:scope,
 {:next=>[:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>1}]]]}],
  :rval=>false,
  :tbl=>nil}]

Ruby 1.9:

>> pp "BEGIN { p 1 }; p 2".parse_to_nodes.transform
[:prelude,
 {:head=>
   [:scope,
    {:rval=>false,
     :tbl=>nil,
     :next=>[:fcall, {:args=>[:array, [[:lit, {:lit=>1}]]], :mid=>:p}]}],
  :body=>[:fcall, {:args=>[:array, [[:lit, {:lit=>2}]]], :mid=>:p}]}]