org.gjt.xpp.impl.tokenizer
Class Tokenizer

java.lang.Object
  extended byorg.gjt.xpp.impl.tokenizer.Tokenizer

public class Tokenizer
extends java.lang.Object

Simpe XML Tokenizer (SXT) performs input stream tokenizing. Advantages:

Limitations:

Author:
Aleksander Slominski

Field Summary
static byte ATTR_CHARACTERS
           
static byte ATTR_CONTENT
           
static byte ATTR_NAME
           
 char[] buf
           
static byte CDSECT
           
static byte CHAR_REF
           
static byte CHARACTERS
           
static byte COMMENT
           
static byte CONTENT
           
static byte DOCTYPE
           
static byte EMPTY_ELEMENT
           
static byte END_DOCUMENT
           
static byte ENTITY_REF
           
static byte ETAG_NAME
           
protected static int LOOKUP_MAX
           
protected static char LOOKUP_MAX_CHAR
           
protected static boolean[] lookupNameChar
           
protected static boolean[] lookupNameStartChar
           
 int nsColonCount
           
 boolean paramNotifyAttValue
           
 boolean paramNotifyCDSect
           
 boolean paramNotifyCharacters
           
 boolean paramNotifyCharRef
           
 boolean paramNotifyComment
           
 boolean paramNotifyDoctype
           
 boolean paramNotifyEntityRef
           
 boolean paramNotifyPI
           
 boolean parsedContent
          This falg decides which buffer will be used to retrieve content for current token.
 char[] pc
          This is buffer for parsed content such as actual valuue of entity ('&lt;' in buf but in pc it is '<')
 int pcEnd
           
 int pcStart
          Range [pcStart, pcEnd) defines part of pc that is content of current token iff parsedContent == false
static byte PI
           
 int pos
          position of next char that will be read from buffer
 int posEnd
           
 int posNsColon
           
 int posStart
          Range [posStart, posEnd) defines part of buf that is content of current token iff parsedContent == false
 boolean seenContent
           
static byte STAG_END
           
static byte STAG_NAME
           
 
Constructor Summary
Tokenizer()
           
 
Method Summary
 int getBufferShrinkOffset()
           
 int getColumnNumber()
           
 int getHardLimit()
           
 int getLineNumber()
           
 java.lang.String getPosDesc()
          Return string describing current position of parsers as text 'at line %d (row) and column %d (colum) [seen %s...]'.
 int getSoftLimit()
           
 boolean isAllowedMixedContent()
           
 boolean isBufferShrinkable()
           
protected  boolean isNameChar(char ch)
           
protected  boolean isNameStartChar(char ch)
           
protected  boolean isS(char ch)
          Determine if ch is whitespace ([3] S)
 byte next()
          Return next recognized toke or END_DOCUMENT if no more input.
 void reset()
           
 void setAllowedMixedContent(boolean enable)
          Set support for mixed conetent.
 void setBufferShrinkable(boolean shrinkable)
           
 void setHardLimit(int value)
          Set hard limit on internal buffer size.
 void setInput(char[] data)
          Reset tokenizer state and set new input source
 void setInput(char[] data, int off, int len)
           
 void setInput(java.io.Reader r)
          Reset tokenizer state and set new input source
 void setNotifyAll(boolean enable)
          Set notification of all XML content tokens: Characters, Comment, CDSect, Doctype, PI, EntityRef, CharRef and AttValue (tokens for STag, ETag and Attribute are always sent).
 void setParseContent(boolean enable)
          Allow reporting parsed content for element content and attribute content (no need to deal with low level tokens such as in setNotifyAll).
 void setSoftLimit(int value)
          Set soft limit on internal buffer size.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

END_DOCUMENT

public static final byte END_DOCUMENT
See Also:
Constant Field Values

CONTENT

public static final byte CONTENT
See Also:
Constant Field Values

CHARACTERS

public static final byte CHARACTERS
See Also:
Constant Field Values

CDSECT

public static final byte CDSECT
See Also:
Constant Field Values

COMMENT

public static final byte COMMENT
See Also:
Constant Field Values

DOCTYPE

public static final byte DOCTYPE
See Also:
Constant Field Values

PI

public static final byte PI
See Also:
Constant Field Values

ENTITY_REF

public static final byte ENTITY_REF
See Also:
Constant Field Values

CHAR_REF

public static final byte CHAR_REF
See Also:
Constant Field Values

ETAG_NAME

public static final byte ETAG_NAME
See Also:
Constant Field Values

EMPTY_ELEMENT

public static final byte EMPTY_ELEMENT
See Also:
Constant Field Values

STAG_END

public static final byte STAG_END
See Also:
Constant Field Values

STAG_NAME

public static final byte STAG_NAME
See Also:
Constant Field Values

ATTR_NAME

public static final byte ATTR_NAME
See Also:
Constant Field Values

ATTR_CHARACTERS

public static final byte ATTR_CHARACTERS
See Also:
Constant Field Values

ATTR_CONTENT

public static final byte ATTR_CONTENT
See Also:
Constant Field Values

paramNotifyCharacters

public boolean paramNotifyCharacters

paramNotifyComment

public boolean paramNotifyComment

paramNotifyCDSect

public boolean paramNotifyCDSect

paramNotifyDoctype

public boolean paramNotifyDoctype

paramNotifyPI

public boolean paramNotifyPI

paramNotifyCharRef

public boolean paramNotifyCharRef

paramNotifyEntityRef

public boolean paramNotifyEntityRef

paramNotifyAttValue

public boolean paramNotifyAttValue

buf

public char[] buf

pos

public int pos
position of next char that will be read from buffer


posStart

public int posStart
Range [posStart, posEnd) defines part of buf that is content of current token iff parsedContent == false


posEnd

public int posEnd

posNsColon

public int posNsColon

nsColonCount

public int nsColonCount

seenContent

public boolean seenContent

parsedContent

public boolean parsedContent
This falg decides which buffer will be used to retrieve content for current token. If true use pc and [pcStart, pcEnd) and if false use buf and [posStart, posEnd)


pc

public char[] pc
This is buffer for parsed content such as actual valuue of entity ('&lt;' in buf but in pc it is '<')


pcStart

public int pcStart
Range [pcStart, pcEnd) defines part of pc that is content of current token iff parsedContent == false


pcEnd

public int pcEnd

LOOKUP_MAX

protected static final int LOOKUP_MAX
See Also:
Constant Field Values

LOOKUP_MAX_CHAR

protected static final char LOOKUP_MAX_CHAR
See Also:
Constant Field Values

lookupNameStartChar

protected static boolean[] lookupNameStartChar

lookupNameChar

protected static boolean[] lookupNameChar
Constructor Detail

Tokenizer

public Tokenizer()
Method Detail

reset

public void reset()

setInput

public void setInput(java.io.Reader r)
Reset tokenizer state and set new input source


setInput

public void setInput(char[] data)
Reset tokenizer state and set new input source


setInput

public void setInput(char[] data,
                     int off,
                     int len)

setNotifyAll

public void setNotifyAll(boolean enable)
Set notification of all XML content tokens: Characters, Comment, CDSect, Doctype, PI, EntityRef, CharRef and AttValue (tokens for STag, ETag and Attribute are always sent).


setParseContent

public void setParseContent(boolean enable)
Allow reporting parsed content for element content and attribute content (no need to deal with low level tokens such as in setNotifyAll).


isAllowedMixedContent

public boolean isAllowedMixedContent()

setAllowedMixedContent

public void setAllowedMixedContent(boolean enable)
Set support for mixed conetent. If mixed content is disabled tokenizer will do its best to ensure that no element has mixed content model also ignorable whitespaces will not be reported as element content.


getSoftLimit

public int getSoftLimit()

setSoftLimit

public void setSoftLimit(int value)
                  throws TokenizerException
Set soft limit on internal buffer size. That means suggested size that tokznzier will try to keep.

Throws:
TokenizerException

getHardLimit

public int getHardLimit()

setHardLimit

public void setHardLimit(int value)
                  throws TokenizerException
Set hard limit on internal buffer size. That means that if input (such as element content) is bigger than hard limit size tokenizer will throw XmlTokenizerBufferOverflowException.

Throws:
TokenizerException

getBufferShrinkOffset

public int getBufferShrinkOffset()

setBufferShrinkable

public void setBufferShrinkable(boolean shrinkable)
                         throws TokenizerException
Throws:
TokenizerException

isBufferShrinkable

public boolean isBufferShrinkable()

getPosDesc

public java.lang.String getPosDesc()
Return string describing current position of parsers as text 'at line %d (row) and column %d (colum) [seen %s...]'.


getLineNumber

public int getLineNumber()

getColumnNumber

public int getColumnNumber()

isNameStartChar

protected boolean isNameStartChar(char ch)

isNameChar

protected boolean isNameChar(char ch)

isS

protected boolean isS(char ch)
Determine if ch is whitespace ([3] S)


next

public byte next()
          throws TokenizerException,
                 java.io.IOException
Return next recognized toke or END_DOCUMENT if no more input.

This is simple automata (in pseudo-code):

 byte next() {
    while(state != END_DOCUMENT) {
      ch = more();  // read character from input
      state = func(ch, state); // do transition
      if(state is accepting)
        return state;  // return token to caller
    }
 }
 

For speed (and simplicity?) it is using few procedures such as readName() or isS().

Throws:
TokenizerException
java.io.IOException


Copyright (c) 2003 IU Extreme! Lab http://www.extreme.indiana.edu/ All Rights Reserved.

Note this package is deprecated by XPP3 that implements XmlPull API