Parent

Namespace

Files

Class Index [+]

Quicksearch

CodeRay::Tokens

Tokens

The Tokens class represents a list of tokens returnd from a Scanner.

A token is not a special object, just a two-element Array consisting of

A token looks like this:

  ['# It looks like this', :comment]
  ['3.1415926', :float]
  ['$^', :error]

Some scanners also yield sub-tokens, represented by special token actions, namely :open and :close.

The Ruby scanner, for example, splits “a string” into:

 [
  [:open, :string],
  ['"', :delimiter],
  ['a string', :content],
  ['"', :delimiter],
  [:close, :string]
 ]

Tokens is the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.

Thus, the syntax below becomes clear:

  CodeRay.scan('price = 2.59', :ruby).html
  # the Tokens object is here -------^

See how small it is? ;)

Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip’ed string that you put in your DB.

It also allows you to generate tokens directly (without using a scanner), to load them from a file, and still use any Encoder that CodeRay provides.

Tokens’ subclass TokenStream allows streaming to save memory.

Constants

ClassOfKind

Attributes

scanner[RW]

The Scanner instance that created the tokens.

Public Class Methods

load(dump) click to toggle source

Undump the object using Marshal.load, then unzip it using GZip.gunzip.

The result is commonly a Tokens object, but this is not guaranteed.

     # File lib/coderay/tokens.rb, line 267
267:     def Tokens.load dump
268:       require 'coderay/helpers/gzip_simple'
269:       dump = dump.gunzip
270:       @dump = Marshal.load dump
271:     end

Public Instance Methods

dump(gzip_level = 7) click to toggle source

Dumps the object into a String that can be saved in files or databases.

The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.

The returned String object includes Undumping so it has an # method. See Tokens.load.

You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.

See GZip module.

     # File lib/coderay/tokens.rb, line 228
228:     def dump gzip_level = 7
229:       require 'coderay/helpers/gzip_simple'
230:       dump = Marshal.dump self
231:       dump = dump.gzip gzip_level
232:       dump.extend Undumping
233:     end
each(kind_filter = nil, &block) click to toggle source

Iterates over all tokens.

If a filter is given, only tokens of that kind are yielded.

    # File lib/coderay/tokens.rb, line 67
67:     def each kind_filter = nil, &block
68:       unless kind_filter
69:         super(&block)
70:       else
71:         super() do |text, kind|
72:           next unless kind == kind_filter
73:           yield text, kind
74:         end
75:       end
76:     end
each_text_token() click to toggle source

Iterates over all text tokens. Range tokens like [:open, :string] are left out.

Example:

  tokens.each_text_token { |text, kind| text.replace html_escape(text) }
    # File lib/coderay/tokens.rb, line 83
83:     def each_text_token
84:       each do |text, kind|
85:         next unless text.is_a? ::String
86:         yield text, kind
87:       end
88:     end
encode(encoder, options = {}) click to toggle source

Encode the tokens using encoder.

encoder can be

  • a symbol like :html oder :statistic

  • an Encoder class

  • an Encoder object

options are passed to the encoder.

     # File lib/coderay/tokens.rb, line 98
 98:     def encode encoder, options = {}
 99:       unless encoder.is_a? Encoders::Encoder
100:         unless encoder.is_a? Class
101:           encoder_class = Encoders[encoder]
102:         end
103:         encoder = encoder_class.new options
104:       end
105:       encoder.encode_tokens self, options
106:     end
fix() click to toggle source

Ensure that all :open tokens have a correspondent :close one.

TODO: Test this!

     # File lib/coderay/tokens.rb, line 165
165:     def fix
166:       tokens = self.class.new
167:       # Check token nesting using a stack of kinds.
168:       opened = []
169:       for type, kind in self
170:         case type
171:         when :open
172:           opened.push [:close, kind]
173:         when :begin_line
174:           opened.push [:end_line, kind]
175:         when :close, :end_line
176:           expected = opened.pop
177:           if [type, kind] != expected
178:             # Unexpected :close; decide what to do based on the kind:
179:             # - token was never opened: delete the :close (just skip it)
180:             next unless opened.rindex expected
181:             # - token was opened earlier: also close tokens in between
182:             tokens << token until (token = opened.pop) == expected
183:           end
184:         end
185:         tokens << [type, kind]
186:       end
187:       # Close remaining opened tokens
188:       tokens << token while token = opened.pop
189:       tokens
190:     end
fix!() click to toggle source
     # File lib/coderay/tokens.rb, line 192
192:     def fix!
193:       replace fix
194:     end
method_missing(meth, options = {}) click to toggle source

Redirects unknown methods to encoder calls.

For example, if you call tokens.html, the HTML encoder is used to highlight the tokens.

     # File lib/coderay/tokens.rb, line 120
120:     def method_missing meth, options = {}
121:       Encoders[meth].new(options).encode_tokens self
122:     end
optimize() click to toggle source

Returns the tokens compressed by joining consecutive tokens of the same kind.

This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.

Combined with dump, it saves space for the cost of time.

If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.

     # File lib/coderay/tokens.rb, line 135
135:     def optimize
136:       last_kind = last_text = nil
137:       new = self.class.new
138:       for text, kind in self
139:         if text.is_a? String
140:           if kind == last_kind
141:             last_text << text
142:           else
143:             new << [last_text, last_kind] if last_kind
144:             last_text = text
145:             last_kind = kind
146:           end
147:         else
148:           new << [last_text, last_kind] if last_kind
149:           last_kind = last_text = nil
150:           new << [text, kind]
151:         end
152:       end
153:       new << [last_text, last_kind] if last_kind
154:       new
155:     end
optimize!() click to toggle source

Compact the object itself; see optimize.

     # File lib/coderay/tokens.rb, line 158
158:     def optimize!
159:       replace optimize
160:     end
split_into_lines() click to toggle source

TODO: Scanner#split_into_lines

Makes sure that:

  • newlines are single tokens (which means all other token are single-line)

  • there are no open tokens at the end the line

This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.

     # File lib/coderay/tokens.rb, line 205
205:     def split_into_lines
206:       raise NotImplementedError
207:     end
split_into_lines!() click to toggle source
     # File lib/coderay/tokens.rb, line 209
209:     def split_into_lines!
210:       replace split_into_lines
211:     end
stream?() click to toggle source

Whether the object is a TokenStream.

Returns false.

    # File lib/coderay/tokens.rb, line 60
60:     def stream?
61:       false
62:     end
text() click to toggle source

Return all text tokens joined into a single string.

     # File lib/coderay/tokens.rb, line 247
247:     def text
248:       map { |t, k| t if t.is_a? ::String }.join
249:     end
text_size() click to toggle source

The total size of the tokens. Should be equal to the input size before scanning.

     # File lib/coderay/tokens.rb, line 238
238:     def text_size
239:       size = 0
240:       each_text_token do |t, k|
241:         size + t.size
242:       end
243:       size
244:     end
to_s(options = {}) click to toggle source

Turn into a string using Encoders::Text.

options are passed to the encoder if given.

     # File lib/coderay/tokens.rb, line 112
112:     def to_s options = {}
113:       encode :text, options
114:     end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.