Parent

PDF::Reader::Buffer

A string tokeniser that recognises PDF grammar. When passed an IO stream or a string, repeated calls to token() will return the next token from the source.

This is very low level, and getting the raw tokens is not very useful in itself.

This will usually be used in conjunction with PDF:Reader::Parser, which converts the raw tokens into objects we can work with (strings, ints, arrays, etc)

Attributes

pos[R]

Public Class Methods

new(io, opts = {}) click to toggle source

Creates a new buffer.

Params:

io - an IO stream or string with the raw data to tokenise

options:

:seek - a byte offset to seek to before starting to tokenise
:content_stream - set to true if buffer will be tokenising a
                  content stream. Defaults to false
# File lib/pdf/reader/buffer.rb, line 53
def initialize (io, opts = {})
  @io = io
  @tokens = []
  @in_content_stream = opts[:content_stream]

  @io.seek(opts[:seek]) if opts[:seek]
  @pos = @io.pos
end

Public Instance Methods

empty?() click to toggle source

return true if there are no more tokens left

# File lib/pdf/reader/buffer.rb, line 64
def empty?
  prepare_tokens if @tokens.size < 3

  @tokens.empty?
end
find_first_xref_offset() click to toggle source

return the byte offset where the first XRef table in th source can be found.

# File lib/pdf/reader/buffer.rb, line 116
def find_first_xref_offset
  @io.seek(-1024, IO::SEEK_END) rescue @io.seek(0)
  data = @io.read(1024)

  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
  lines = data.split(/[\n\r]+/).reverse
  eof_index = lines.index { |l| l.strip == "%%EOF" }

  raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil?
  raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1
  lines[eof_index+1].to_i
end
read(bytes, opts = {}) click to toggle source

return raw bytes from the underlying IO stream.

bytes - the number of bytes to read

options:

:skip_eol - if true, the IO stream is advanced past any LF or CR
            bytes before it reads any data. This is to handle
            content streams, which have a CRLF or LF after the stream
            token.
# File lib/pdf/reader/buffer.rb, line 81
def read(bytes, opts = {})
  reset_pos

  if opts[:skip_eol]
    done = false
    while !done
      chr = @io.read(1)
      if chr.nil?
        return nil
      elsif chr != "\n" && chr != "\r"
        @io.seek(-1, IO::SEEK_CUR)
        done = true
      end
    end
  end

  bytes = @io.read(bytes)
  save_pos
  bytes
end
token() click to toggle source

return the next token from the source. Returns a string if a token is found, nil if there are no tokens left.

# File lib/pdf/reader/buffer.rb, line 105
def token
  reset_pos
  prepare_tokens if @tokens.size < 3
  merge_indirect_reference
  prepare_tokens if @tokens.size < 3

  @tokens.shift
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.