de.pdark.decentxml
Class XMLTokenizer

java.lang.Object
  extended by de.pdark.decentxml.XMLTokenizer
Direct Known Subclasses:
DTDTokenizer

public class XMLTokenizer
extends java.lang.Object

This class allows to chop an XMLSource into tokens.

You can use it to parse XML yourself or use the XMLParser to let it parse XML into a Document.

Author:
digulla
See Also:
XMLSource, XMLParser, Document

Nested Class Summary
static class XMLTokenizer.Type
          Types of tokens the tokenizer can return
 
Field Summary
protected  boolean inStartElement
          true if we're currently inside of a start tag
protected  int pos
          The current position in the source
protected  XMLSource source
           
 
Constructor Summary
XMLTokenizer(XMLSource source)
           
 
Method Summary
protected  Token createToken()
          All tokens are created here.
protected  void expect(char expected)
          Check that the next character is expected and skip it
 CharValidator getCharValidator()
           
 EntityResolver getEntityResolver()
           
 int getOffset()
          Get the current parsing position (for error handling, for example).
 XMLSource getSource()
           
 boolean isTreatEntitiesAsText()
           
protected  java.lang.String lookAheadForErrorMessage(java.lang.String conditionalPrefix, int pos, int len)
           
 Token next()
          Fetch the next token from the source.
protected  char nextChar(java.lang.String errorMessage)
           
protected  void nextChars(java.lang.String expected, int startPos, java.lang.String errorMessage)
           
protected  void parseAttribute(Token token)
          Read the attribute of an element.
protected  void parseBeginElement(Token token)
          Read the name of an element.
protected  void parseBeginSomething(Token token)
          Read one of "<tag", "<?pi", "<!--", "<![CDATA[" or a end tag.
protected  void parseCData(Token token)
          Parse a CDATA element.
protected  void parseComment(Token token)
          Read a comment.
protected  void parseDocType(Token token)
          Parse a doctype declaration
protected  void parseEndElement(Token token)
          Read an end tag.
protected  void parseEntity(Token token)
           
protected  void parseExcalamation(Token token)
          Parse "<!--" or "<![CDATA["
protected  void parseName(java.lang.String objectName)
          Read an XML name
protected  void parseProcessingInstruction(Token token)
          Read a processing instruction.
protected  void parseText(Token token)
          Read a piece of text.
 XMLTokenizer setCharValidator(CharValidator charValidator)
           
 XMLTokenizer setEntityResolver(EntityResolver resolver)
           
 void setOffset(int offset)
          Set the current parsing position.
 XMLTokenizer setTreatEntitiesAsText(boolean treatEntitiesAsText)
           
protected  void skipChar(char c)
          Advance one or two positions, depending on whether the current character if the high part of a surrogate pair.
protected  void skipWhiteSpace()
          Advance the current position past any whitespace in the input
protected  void verifyEntity(int start, int end)
          Verify an entity.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

source

protected final XMLSource source

pos

protected int pos
The current position in the source


inStartElement

protected boolean inStartElement
true if we're currently inside of a start tag

Constructor Detail

XMLTokenizer

public XMLTokenizer(XMLSource source)
Method Detail

setTreatEntitiesAsText

public XMLTokenizer setTreatEntitiesAsText(boolean treatEntitiesAsText)

isTreatEntitiesAsText

public boolean isTreatEntitiesAsText()

getCharValidator

public CharValidator getCharValidator()

setCharValidator

public XMLTokenizer setCharValidator(CharValidator charValidator)

getEntityResolver

public EntityResolver getEntityResolver()

setEntityResolver

public XMLTokenizer setEntityResolver(EntityResolver resolver)

next

public Token next()
Fetch the next token from the source. Returns null if there are no more tokens in the input.

Returns:
The next token or null at EOF

createToken

protected Token createToken()
All tokens are created here.

Use this method to create custom tokens with additional information.

Returns:
a new, pre-initialized token

getSource

public XMLSource getSource()

getOffset

public int getOffset()
Get the current parsing position (for error handling, for example).

This value is not very accurate because the tokenizer might be anywhere in the stream.


setOffset

public void setOffset(int offset)
Set the current parsing position. You can use this to restart parsing after an error or to jump around in the input.


parseBeginSomething

protected void parseBeginSomething(Token token)
Read one of "<tag", "<?pi", "<!--", "<![CDATA[" or a end tag.


parseBeginElement

protected void parseBeginElement(Token token)
Read the name of an element.

The resulting token will contain the '<' plus any whitespace between it and the name plus the name itself but no whitespace after the name.


parseEndElement

protected void parseEndElement(Token token)
Read an end tag.

The resulting token will contain the '</' and '>' plus the name plus any whitespace between those three.


parseExcalamation

protected void parseExcalamation(Token token)
Parse "<!--" or "<![CDATA["


parseDocType

protected void parseDocType(Token token)
Parse a doctype declaration

The resulting token will contain "


parseCData

protected void parseCData(Token token)
Parse a CDATA element.

The resulting token will contain the "<![CDATA[" plus the terminating "]]>".


parseComment

protected void parseComment(Token token)
Read a comment.

The resulting token will contain the "<!--" plus the terminating "-->".


parseProcessingInstruction

protected void parseProcessingInstruction(Token token)
Read a processing instruction.

The resulting token will contain the "<?" plus the terminating "?>".


parseAttribute

protected void parseAttribute(Token token)
Read the attribute of an element.

The resulting token will contain the name, "=" plus the quotes and the value.


parseName

protected void parseName(java.lang.String objectName)
Read an XML name


parseText

protected void parseText(Token token)
Read a piece of text.

The resulting token will contain the text as is with all the entity and numeric character references.


skipChar

protected void skipChar(char c)
Advance one or two positions, depending on whether the current character if the high part of a surrogate pair.


verifyEntity

protected void verifyEntity(int start,
                            int end)
Verify an entity. If no entityResolver is installed, this does nothing.


parseEntity

protected void parseEntity(Token token)

nextChars

protected void nextChars(java.lang.String expected,
                         int startPos,
                         java.lang.String errorMessage)

nextChar

protected char nextChar(java.lang.String errorMessage)

expect

protected void expect(char expected)
Check that the next character is expected and skip it


lookAheadForErrorMessage

protected java.lang.String lookAheadForErrorMessage(java.lang.String conditionalPrefix,
                                                    int pos,
                                                    int len)

skipWhiteSpace

protected void skipWhiteSpace()
Advance the current position past any whitespace in the input



Copyright © 2008-2011. All Rights Reserved.