|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.gjt.xpp.impl.tokenizer.Tokenizer
public class Tokenizer
Simpe XML Tokenizer (SXT) performs input stream tokenizing. Advantages:
Field Summary | |
---|---|
static byte |
ATTR_CHARACTERS
|
static byte |
ATTR_CONTENT
|
static byte |
ATTR_NAME
|
char[] |
buf
|
static byte |
CDSECT
|
static byte |
CHAR_REF
|
static byte |
CHARACTERS
|
static byte |
COMMENT
|
static byte |
CONTENT
|
static byte |
DOCTYPE
|
static byte |
EMPTY_ELEMENT
|
static byte |
END_DOCUMENT
|
static byte |
ENTITY_REF
|
static byte |
ETAG_NAME
|
protected static int |
LOOKUP_MAX
|
protected static char |
LOOKUP_MAX_CHAR
|
protected static boolean[] |
lookupNameChar
|
protected static boolean[] |
lookupNameStartChar
|
int |
nsColonCount
|
boolean |
paramNotifyAttValue
|
boolean |
paramNotifyCDSect
|
boolean |
paramNotifyCharacters
|
boolean |
paramNotifyCharRef
|
boolean |
paramNotifyComment
|
boolean |
paramNotifyDoctype
|
boolean |
paramNotifyEntityRef
|
boolean |
paramNotifyPI
|
boolean |
parsedContent
This falg decides which buffer will be used to retrieve content for current token. |
char[] |
pc
This is buffer for parsed content such as actual valuue of entity ('<' in buf but in pc it is '<') |
int |
pcEnd
|
int |
pcStart
Range [pcStart, pcEnd) defines part of pc that is content of current token iff parsedContent == false |
static byte |
PI
|
int |
pos
position of next char that will be read from buffer |
int |
posEnd
|
int |
posNsColon
|
int |
posStart
Range [posStart, posEnd) defines part of buf that is content of current token iff parsedContent == false |
boolean |
seenContent
|
static byte |
STAG_END
|
static byte |
STAG_NAME
|
Constructor Summary | |
---|---|
Tokenizer()
|
Method Summary | |
---|---|
int |
getBufferShrinkOffset()
|
int |
getColumnNumber()
|
int |
getHardLimit()
|
int |
getLineNumber()
|
java.lang.String |
getPosDesc()
Return string describing current position of parsers as text 'at line %d (row) and column %d (colum) [seen %s...]'. |
int |
getSoftLimit()
|
boolean |
isAllowedMixedContent()
|
boolean |
isBufferShrinkable()
|
protected boolean |
isNameChar(char ch)
|
protected boolean |
isNameStartChar(char ch)
|
protected boolean |
isS(char ch)
Determine if ch is whitespace ([3] S) |
byte |
next()
Return next recognized toke or END_DOCUMENT if no more input. |
void |
reset()
|
void |
setAllowedMixedContent(boolean enable)
Set support for mixed conetent. |
void |
setBufferShrinkable(boolean shrinkable)
|
void |
setHardLimit(int value)
Set hard limit on internal buffer size. |
void |
setInput(char[] data)
Reset tokenizer state and set new input source |
void |
setInput(char[] data,
int off,
int len)
|
void |
setInput(java.io.Reader r)
Reset tokenizer state and set new input source |
void |
setNotifyAll(boolean enable)
Set notification of all XML content tokens: Characters, Comment, CDSect, Doctype, PI, EntityRef, CharRef and AttValue (tokens for STag, ETag and Attribute are always sent). |
void |
setParseContent(boolean enable)
Allow reporting parsed content for element content and attribute content (no need to deal with low level tokens such as in setNotifyAll). |
void |
setSoftLimit(int value)
Set soft limit on internal buffer size. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final byte END_DOCUMENT
public static final byte CONTENT
public static final byte CHARACTERS
public static final byte CDSECT
public static final byte COMMENT
public static final byte DOCTYPE
public static final byte PI
public static final byte ENTITY_REF
public static final byte CHAR_REF
public static final byte ETAG_NAME
public static final byte EMPTY_ELEMENT
public static final byte STAG_END
public static final byte STAG_NAME
public static final byte ATTR_NAME
public static final byte ATTR_CHARACTERS
public static final byte ATTR_CONTENT
public boolean paramNotifyCharacters
public boolean paramNotifyComment
public boolean paramNotifyCDSect
public boolean paramNotifyDoctype
public boolean paramNotifyPI
public boolean paramNotifyCharRef
public boolean paramNotifyEntityRef
public boolean paramNotifyAttValue
public char[] buf
public int pos
public int posStart
public int posEnd
public int posNsColon
public int nsColonCount
public boolean seenContent
public boolean parsedContent
public char[] pc
public int pcStart
public int pcEnd
protected static final int LOOKUP_MAX
protected static final char LOOKUP_MAX_CHAR
protected static boolean[] lookupNameStartChar
protected static boolean[] lookupNameChar
Constructor Detail |
---|
public Tokenizer()
Method Detail |
---|
public void reset()
public void setInput(java.io.Reader r)
public void setInput(char[] data)
public void setInput(char[] data, int off, int len)
public void setNotifyAll(boolean enable)
public void setParseContent(boolean enable)
public boolean isAllowedMixedContent()
public void setAllowedMixedContent(boolean enable)
public int getSoftLimit()
public void setSoftLimit(int value) throws TokenizerException
TokenizerException
public int getHardLimit()
public void setHardLimit(int value) throws TokenizerException
TokenizerException
public int getBufferShrinkOffset()
public void setBufferShrinkable(boolean shrinkable) throws TokenizerException
TokenizerException
public boolean isBufferShrinkable()
public java.lang.String getPosDesc()
public int getLineNumber()
public int getColumnNumber()
protected boolean isNameStartChar(char ch)
protected boolean isNameChar(char ch)
protected boolean isS(char ch)
public byte next() throws TokenizerException, java.io.IOException
This is simple automata (in pseudo-code):
byte next() { while(state != END_DOCUMENT) { ch = more(); // read character from input state = func(ch, state); // do transition if(state is accepting) return state; // return token to caller } }
For speed (and simplicity?) it is using few procedures such as readName() or isS().
TokenizerException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |