Array
The Tokens class represents a list of tokens returnd from a Scanner.
A token is not a special object, just a two-element Array consisting of
the token text (the original source of the token in a String) or a token action (:open, :close, :begin_line, :end_line)
the token kind (a Symbol representing the type of the token)
A token looks like this:
['# It looks like this', :comment] ['3.1415926', :float] ['$^', :error]
Some scanners also yield sub-tokens, represented by special token actions, namely :open and :close.
The Ruby scanner, for example, splits “a string” into:
[ [:open, :string], ['"', :delimiter], ['a string', :content], ['"', :delimiter], [:close, :string] ]
Tokens is the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.
Thus, the syntax below becomes clear:
CodeRay.scan('price = 2.59', :ruby).html # the Tokens object is here -------^
See how small it is? ;)
Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip’ed string that you put in your DB.
It also allows you to generate tokens directly (without using a scanner), to load them from a file, and still use any Encoder that CodeRay provides.
Tokens’ subclass TokenStream allows streaming to save memory.
Undump the object using Marshal.load, then unzip it using GZip.gunzip.
The result is commonly a Tokens object, but this is not guaranteed.
# File lib/coderay/tokens.rb, line 267 267: def Tokens.load dump 268: require 'coderay/helpers/gzip_simple' 269: dump = dump.gunzip 270: @dump = Marshal.load dump 271: end
Dumps the object into a String that can be saved in files or databases.
The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.
The returned String object includes Undumping so it has an # method. See Tokens.load.
You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.
See GZip module.
# File lib/coderay/tokens.rb, line 228 228: def dump gzip_level = 7 229: require 'coderay/helpers/gzip_simple' 230: dump = Marshal.dump self 231: dump = dump.gzip gzip_level 232: dump.extend Undumping 233: end
Iterates over all tokens.
If a filter is given, only tokens of that kind are yielded.
# File lib/coderay/tokens.rb, line 67 67: def each kind_filter = nil, &block 68: unless kind_filter 69: super(&block) 70: else 71: super() do |text, kind| 72: next unless kind == kind_filter 73: yield text, kind 74: end 75: end 76: end
Iterates over all text tokens. Range tokens like [:open, :string] are left out.
Example:
tokens.each_text_token { |text, kind| text.replace html_escape(text) }
# File lib/coderay/tokens.rb, line 83 83: def each_text_token 84: each do |text, kind| 85: next unless text.is_a? ::String 86: yield text, kind 87: end 88: end
Encode the tokens using encoder.
encoder can be
a symbol like :html oder :statistic
an Encoder class
an Encoder object
options are passed to the encoder.
# File lib/coderay/tokens.rb, line 98 98: def encode encoder, options = {} 99: unless encoder.is_a? Encoders::Encoder 100: unless encoder.is_a? Class 101: encoder_class = Encoders[encoder] 102: end 103: encoder = encoder_class.new options 104: end 105: encoder.encode_tokens self, options 106: end
Ensure that all :open tokens have a correspondent :close one.
TODO: Test this!
# File lib/coderay/tokens.rb, line 165 165: def fix 166: tokens = self.class.new 167: # Check token nesting using a stack of kinds. 168: opened = [] 169: for type, kind in self 170: case type 171: when :open 172: opened.push [:close, kind] 173: when :begin_line 174: opened.push [:end_line, kind] 175: when :close, :end_line 176: expected = opened.pop 177: if [type, kind] != expected 178: # Unexpected :close; decide what to do based on the kind: 179: # - token was never opened: delete the :close (just skip it) 180: next unless opened.rindex expected 181: # - token was opened earlier: also close tokens in between 182: tokens << token until (token = opened.pop) == expected 183: end 184: end 185: tokens << [type, kind] 186: end 187: # Close remaining opened tokens 188: tokens << token while token = opened.pop 189: tokens 190: end
# File lib/coderay/tokens.rb, line 192 192: def fix! 193: replace fix 194: end
Redirects unknown methods to encoder calls.
For example, if you call tokens.html, the HTML encoder is used to highlight the tokens.
# File lib/coderay/tokens.rb, line 120 120: def method_missing meth, options = {} 121: Encoders[meth].new(options).encode_tokens self 122: end
Returns the tokens compressed by joining consecutive tokens of the same kind.
This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.
Combined with dump, it saves space for the cost of time.
If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.
# File lib/coderay/tokens.rb, line 135 135: def optimize 136: last_kind = last_text = nil 137: new = self.class.new 138: for text, kind in self 139: if text.is_a? String 140: if kind == last_kind 141: last_text << text 142: else 143: new << [last_text, last_kind] if last_kind 144: last_text = text 145: last_kind = kind 146: end 147: else 148: new << [last_text, last_kind] if last_kind 149: last_kind = last_text = nil 150: new << [text, kind] 151: end 152: end 153: new << [last_text, last_kind] if last_kind 154: new 155: end
Compact the object itself; see optimize.
# File lib/coderay/tokens.rb, line 158 158: def optimize! 159: replace optimize 160: end
TODO: Scanner#split_into_lines
Makes sure that:
newlines are single tokens (which means all other token are single-line)
there are no open tokens at the end the line
This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.
# File lib/coderay/tokens.rb, line 205 205: def split_into_lines 206: raise NotImplementedError 207: end
# File lib/coderay/tokens.rb, line 209 209: def split_into_lines! 210: replace split_into_lines 211: end
Whether the object is a TokenStream.
Returns false.
# File lib/coderay/tokens.rb, line 60 60: def stream? 61: false 62: end
Return all text tokens joined into a single string.
# File lib/coderay/tokens.rb, line 247 247: def text 248: map { |t, k| t if t.is_a? ::String }.join 249: end
The total size of the tokens. Should be equal to the input size before scanning.
# File lib/coderay/tokens.rb, line 238 238: def text_size 239: size = 0 240: each_text_token do |t, k| 241: size + t.size 242: end 243: size 244: end
Turn into a string using Encoders::Text.
options are passed to the encoder if given.
# File lib/coderay/tokens.rb, line 112 112: def to_s options = {} 113: encode :text, options 114: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.