public class Parser extends Object
Parses the document into a tree of nodes using the
NodeTokenizer
. Nodes are defined by a token or
offset range in the document, Token
. Attributes in beginning
nodes are also parsed into token offsets by the AttributeTokenizer
.
A document tree is built representing nodes in the target document. The
document can be a HTML fragment that is not well-formed or an XML
fragment of a XHTML document.
Modifier and Type | Field and Description |
---|---|
static org.apache.shale.clay.parser.Parser.Rule[] |
BEGIN_CDATA_RULES
Declare an array of
Parser.Rule s that validate a begin CDATA Token . |
static org.apache.shale.clay.parser.Parser.Rule[] |
BEGIN_COMMENT_TAG_RULES
Declare an array of
Parser.Rule s that validate a begin comment Token . |
static org.apache.shale.clay.parser.Parser.Rule[] |
BEGIN_TAG_RULES
Declare an array of
Parser.Rule s that validate a begining Token . |
static org.apache.shale.clay.parser.Parser.Rule[] |
DOCTYPE_TAG_RULES
Declare an array of
Parser.Rule s that validate document type Token . |
static org.apache.shale.clay.parser.Parser.Rule[] |
END_CDATA_RULES
Declare an array of
Parser.Rule s that validate an end CDATA Token . |
static String |
END_CHARSET_TOKEN
The end of the comment token used to override the template
encoding type.
|
static org.apache.shale.clay.parser.Parser.Rule[] |
END_COMMENT_TAG_RULES
Declare an array of
Parser.Rule s that validate an end comment Token . |
static String |
START_CHARSET_TOKEN
The start of the comment token used to override the template
encoding type.
|
Constructor and Description |
---|
Parser() |
Modifier and Type | Method and Description |
---|---|
protected Node |
buildNode(Token token)
|
protected void |
discoverNodeAttributes(Node node)
If the
Node is a starting tag and not a comment,
use the AttributeTokenizer to realize the node attributes. |
protected void |
discoverNodeName(Node node)
|
protected void |
discoverNodeOverrides(Node node)
|
protected void |
discoverNodeShape(Node node)
Determine if the
Node is a starting, ending, or body text
tag. |
protected Node |
findBeginingNode(Node current,
Node node) |
protected boolean |
isNodeNameEqual(Node node1,
Node node2)
Compares two
Node instances by name . |
protected boolean |
isOptionalEndingTag(String nodeName)
Determines if a HTML nodeName is a type of tag that can optionally have a
ending tag.
|
protected boolean |
isSelfTerminating(String nodeName)
Checks to see if the nodeName is within the
SELF_TERMINATING
table of values. |
protected boolean |
isValidOptionalEndingTagParent(String nodeName,
String parentNodeName)
Checks to see if a optional ending tag has a valid parent.
|
List |
parse(StringBuffer document)
Parse a document fragment into graphs of
Node . |
public static final String START_CHARSET_TOKEN
The start of the comment token used to override the template encoding type.
public static final String END_CHARSET_TOKEN
The end of the comment token used to override the template encoding type.
public static final org.apache.shale.clay.parser.Parser.Rule[] BEGIN_CDATA_RULES
Declare an array of Parser.Rule
s that validate a begin CDATA Token
.
public static final org.apache.shale.clay.parser.Parser.Rule[] END_CDATA_RULES
Declare an array of Parser.Rule
s that validate an end CDATA Token
.
public static final org.apache.shale.clay.parser.Parser.Rule[] BEGIN_COMMENT_TAG_RULES
Declare an array of Parser.Rule
s that validate a begin comment Token
.
public static final org.apache.shale.clay.parser.Parser.Rule[] END_COMMENT_TAG_RULES
Declare an array of Parser.Rule
s that validate an end comment Token
.
public static final org.apache.shale.clay.parser.Parser.Rule[] DOCTYPE_TAG_RULES
Declare an array of Parser.Rule
s that validate document type Token
.
public static final org.apache.shale.clay.parser.Parser.Rule[] BEGIN_TAG_RULES
Declare an array of Parser.Rule
s that validate a begining Token
.
protected boolean isOptionalEndingTag(String nodeName)
Determines if a HTML nodeName is a type of tag that can optionally have a ending tag.
nodeName
- the name of the html nodetrue
if the nodeName is in the
OPTIONAL-ENDING_TAG
array; otherwise, false
is returnedprotected boolean isValidOptionalEndingTagParent(String nodeName, String parentNodeName)
Checks to see if a optional ending tag has a valid parent. This is use to detect a implicit ending tag
nodeName
- of the optional ending tagparentNodeName
- name of the parenttrue
if the parentNodeName is a valid parent for
the nodeName; otherwise, a false
value is returnedprotected Node findBeginingNode(Node current, Node node)
current
- top of the stacknode
- ending nodepublic List parse(StringBuffer document)
Parse a document fragment into graphs of Node
. The resulting
type is a list because the fragment might not be well-formed.
document
- input sourceNode
protected boolean isNodeNameEqual(Node node1, Node node2)
Compares two Node
instances by name
.
This method is used to match a beginning tag with an ending tag
while building the document stack. Returns true
if
the node name
properties are the same.
node1
- first nodenode2
- secnod nodetrue
if they are the sameprotected boolean isSelfTerminating(String nodeName)
Checks to see if the nodeName is within the SELF_TERMINATING
table of values.
nodeName
- to check for self terminationtrue
if is self terminating otherwise
false
protected Node buildNode(Token token)
token
- node offset in the documentprotected void discoverNodeShape(Node node)
Determine if the Node
is a starting, ending, or body text
tag. The array of Parser.Shape
s are used to determine the type of
Node
the Token
representes.
node
- target nodeprotected void discoverNodeName(Node node)
node
- targetprotected void discoverNodeAttributes(Node node)
If the Node
is a starting tag and not a comment,
use the AttributeTokenizer
to realize the node attributes.
node
- targetCopyright © 2004-2013 Apache Software Foundation. All Rights Reserved.