org.cyberneko.html
Class HTMLTagBalancer

java.lang.Object
  extended byorg.cyberneko.html.HTMLTagBalancer
All Implemented Interfaces:
HTMLComponent, XMLComponent, XMLDocumentFilter, XMLDocumentHandler, XMLDocumentSource

public class HTMLTagBalancer
extends Object
implements XMLDocumentFilter, HTMLComponent

Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:

This component recognizes the following features:

This component recognizes the following properties:

Version:
$Id$
Author:
Andy Clark
See Also:
HTMLElements

Nested Class Summary
static class HTMLTagBalancer.Info
          Element info for each start element.
static class HTMLTagBalancer.InfoStack
          Unsynchronized stack of element information.
 
Field Summary
protected static String AUGMENTATIONS
          Include infoset augmentations.
protected static String DOCUMENT_FRAGMENT
          Document fragment balancing only.
protected static String DOCUMENT_FRAGMENT_DEPRECATED
          Document fragment balancing only (deprecated).
protected static String ERROR_REPORTER
          Error reporter.
protected  boolean fAugmentations
          Include infoset augmentations.
protected  boolean fDocumentFragment
          Document fragment balancing only.
protected  XMLDocumentHandler fDocumentHandler
          The document handler.
protected  XMLDocumentSource fDocumentSource
          The document source.
protected  HTMLTagBalancer.InfoStack fElementStack
          The element stack.
protected  HTMLErrorReporter fErrorReporter
          Error reporter.
protected  boolean fIgnoreOutsideContent
          Ignore outside content.
protected  HTMLTagBalancer.InfoStack fInlineStack
          The inline stack.
protected  short fNamesAttrs
          Modify HTML attribute names.
protected  short fNamesElems
          Modify HTML element names.
protected  boolean fNamespaces
          Namespaces.
protected  boolean fReportErrors
          Report errors.
protected  boolean fSeenAnything
          True if seen anything.
protected  boolean fSeenBodyElement
          True if seen <body< element.
protected  boolean fSeenDoctype
          True if root element has been seen.
protected  boolean fSeenHeadElement
          True if seen <head< element.
protected  boolean fSeenRootElement
          True if root element has been seen.
protected  boolean fSeenRootElementEnd
          True if seen the end of the document element.
protected static String IGNORE_OUTSIDE_CONTENT
          Ignore outside content.
protected static String NAMES_ATTRS
          Modify HTML attribute names: { "upper", "lower", "default" }.
protected static String NAMES_ELEMS
          Modify HTML element names: { "upper", "lower", "default" }.
protected static short NAMES_LOWERCASE
          Lowercase HTML names.
protected static short NAMES_MATCH
          Match HTML element names.
protected static short NAMES_NO_CHANGE
          Don't modify HTML names.
protected static short NAMES_UPPERCASE
          Uppercase HTML names.
protected static String NAMESPACES
          Namespaces.
protected static String REPORT_ERRORS
          Report errors.
protected static HTMLEventInfo SYNTHESIZED_ITEM
          Synthesized event info item.
 
Constructor Summary
HTMLTagBalancer()
           
 
Method Summary
protected  void callEndElement(QName element, Augmentations augs)
          Call document handler end element.
protected  void callStartElement(QName element, XMLAttributes attrs, Augmentations augs)
          Call document handler start element.
 void characters(XMLString text, Augmentations augs)
          Characters.
 void comment(XMLString text, Augmentations augs)
          Comment.
 void doctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs)
          Doctype declaration.
protected  XMLAttributes emptyAttributes()
          Returns a set of empty attributes.
 void emptyElement(QName elem, XMLAttributes attrs, Augmentations augs)
          Empty element.
 void endCDATA(Augmentations augs)
          End CDATA section.
 void endDocument(Augmentations augs)
          End document.
 void endElement(QName element, Augmentations augs)
          End element.
 void endGeneralEntity(String name, Augmentations augs)
          End entity.
 void endPrefixMapping(String prefix, Augmentations augs)
          End prefix mapping.
 XMLDocumentHandler getDocumentHandler()
          Returns the document handler.
 XMLDocumentSource getDocumentSource()
          Returns the document source.
protected  HTMLElements.Element getElement(String name)
          Returns an HTML element.
protected  int getElementDepth(HTMLElements.Element element)
          Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
 Boolean getFeatureDefault(String featureId)
          Returns the default state for a feature.
protected static short getNamesValue(String value)
          Converts HTML names string value to constant value.
protected  int getParentDepth(HTMLElements.Element[] parents, short bounds)
          Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
 Object getPropertyDefault(String propertyId)
          Returns the default state for a property.
 String[] getRecognizedFeatures()
          Returns recognized features.
 String[] getRecognizedProperties()
          Returns recognized properties.
 void ignorableWhitespace(XMLString text, Augmentations augs)
          Ignorable whitespace.
protected static String modifyName(String name, short mode)
          Modifies the given name based on the specified mode.
 void processingInstruction(String target, XMLString data, Augmentations augs)
          Processing instruction.
 void reset(XMLComponentManager manager)
          Resets the component.
 void setDocumentHandler(XMLDocumentHandler handler)
          Sets the document handler.
 void setDocumentSource(XMLDocumentSource source)
          Sets the document source.
 void setFeature(String featureId, boolean state)
          Sets a feature.
 void setProperty(String propertyId, Object value)
          Sets a property.
 void startCDATA(Augmentations augs)
          Start CDATA section.
 void startDocument(XMLLocator locator, String encoding, Augmentations augs)
          Start document.
 void startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
          Start document.
 void startElement(QName elem, XMLAttributes attrs, Augmentations augs)
          Start element.
 void startGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
          Start entity.
 void startPrefixMapping(String prefix, String uri, Augmentations augs)
          Start prefix mapping.
protected  Augmentations synthesizedAugs()
          Returns an augmentations object with a synthesized item added.
 void textDecl(String version, String encoding, Augmentations augs)
          Text declaration.
 void xmlDecl(String version, String encoding, String standalone, Augmentations augs)
          XML declaration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAMESPACES

protected static final String NAMESPACES
Namespaces.

See Also:
Constant Field Values

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

See Also:
Constant Field Values

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

See Also:
Constant Field Values

DOCUMENT_FRAGMENT_DEPRECATED

protected static final String DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).

See Also:
Constant Field Values

DOCUMENT_FRAGMENT

protected static final String DOCUMENT_FRAGMENT
Document fragment balancing only.

See Also:
Constant Field Values

IGNORE_OUTSIDE_CONTENT

protected static final String IGNORE_OUTSIDE_CONTENT
Ignore outside content.

See Also:
Constant Field Values

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

See Also:
Constant Field Values

NAMES_NO_CHANGE

protected static final short NAMES_NO_CHANGE
Don't modify HTML names.

See Also:
Constant Field Values

NAMES_MATCH

protected static final short NAMES_MATCH
Match HTML element names.

See Also:
Constant Field Values

NAMES_UPPERCASE

protected static final short NAMES_UPPERCASE
Uppercase HTML names.

See Also:
Constant Field Values

NAMES_LOWERCASE

protected static final short NAMES_LOWERCASE
Lowercase HTML names.

See Also:
Constant Field Values

SYNTHESIZED_ITEM

protected static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.


fNamespaces

protected boolean fNamespaces
Namespaces.


fAugmentations

protected boolean fAugmentations
Include infoset augmentations.


fReportErrors

protected boolean fReportErrors
Report errors.


fDocumentFragment

protected boolean fDocumentFragment
Document fragment balancing only.


fIgnoreOutsideContent

protected boolean fIgnoreOutsideContent
Ignore outside content.


fNamesElems

protected short fNamesElems
Modify HTML element names.


fNamesAttrs

protected short fNamesAttrs
Modify HTML attribute names.


fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.


fDocumentSource

protected XMLDocumentSource fDocumentSource
The document source.


fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
The document handler.


fElementStack

protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.


fInlineStack

protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.


fSeenAnything

protected boolean fSeenAnything
True if seen anything. Important for xml declaration.


fSeenDoctype

protected boolean fSeenDoctype
True if root element has been seen.


fSeenRootElement

protected boolean fSeenRootElement
True if root element has been seen.


fSeenRootElementEnd

protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.


fSeenHeadElement

protected boolean fSeenHeadElement
True if seen <head< element.


fSeenBodyElement

protected boolean fSeenBodyElement
True if seen <body< element.

Constructor Detail

HTMLTagBalancer

public HTMLTagBalancer()
Method Detail

getFeatureDefault

public Boolean getFeatureDefault(String featureId)
Returns the default state for a feature.

Specified by:
getFeatureDefault in interface HTMLComponent

getPropertyDefault

public Object getPropertyDefault(String propertyId)
Returns the default state for a property.

Specified by:
getPropertyDefault in interface HTMLComponent

getRecognizedFeatures

public String[] getRecognizedFeatures()
Returns recognized features.

Specified by:
getRecognizedFeatures in interface XMLComponent

getRecognizedProperties

public String[] getRecognizedProperties()
Returns recognized properties.

Specified by:
getRecognizedProperties in interface XMLComponent

reset

public void reset(XMLComponentManager manager)
           throws XMLConfigurationException
Resets the component.

Specified by:
reset in interface XMLComponent
Throws:
XMLConfigurationException

setFeature

public void setFeature(String featureId,
                       boolean state)
                throws XMLConfigurationException
Sets a feature.

Specified by:
setFeature in interface XMLComponent
Throws:
XMLConfigurationException

setProperty

public void setProperty(String propertyId,
                        Object value)
                 throws XMLConfigurationException
Sets a property.

Specified by:
setProperty in interface XMLComponent
Throws:
XMLConfigurationException

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

Specified by:
setDocumentHandler in interface XMLDocumentSource

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

Specified by:
getDocumentHandler in interface XMLDocumentSource

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          NamespaceContext nscontext,
                          Augmentations augs)
                   throws XNIException
Start document.

Specified by:
startDocument in interface XMLDocumentHandler
Throws:
XNIException

xmlDecl

public void xmlDecl(String version,
                    String encoding,
                    String standalone,
                    Augmentations augs)
             throws XNIException
XML declaration.

Specified by:
xmlDecl in interface XMLDocumentHandler
Throws:
XNIException

doctypeDecl

public void doctypeDecl(String rootElementName,
                        String publicId,
                        String systemId,
                        Augmentations augs)
                 throws XNIException
Doctype declaration.

Specified by:
doctypeDecl in interface XMLDocumentHandler
Throws:
XNIException

endDocument

public void endDocument(Augmentations augs)
                 throws XNIException
End document.

Specified by:
endDocument in interface XMLDocumentHandler
Throws:
XNIException

comment

public void comment(XMLString text,
                    Augmentations augs)
             throws XNIException
Comment.

Specified by:
comment in interface XMLDocumentHandler
Throws:
XNIException

processingInstruction

public void processingInstruction(String target,
                                  XMLString data,
                                  Augmentations augs)
                           throws XNIException
Processing instruction.

Specified by:
processingInstruction in interface XMLDocumentHandler
Throws:
XNIException

startElement

public void startElement(QName elem,
                         XMLAttributes attrs,
                         Augmentations augs)
                  throws XNIException
Start element.

Specified by:
startElement in interface XMLDocumentHandler
Throws:
XNIException

emptyElement

public void emptyElement(QName elem,
                         XMLAttributes attrs,
                         Augmentations augs)
                  throws XNIException
Empty element.

Specified by:
emptyElement in interface XMLDocumentHandler
Throws:
XNIException

startGeneralEntity

public void startGeneralEntity(String name,
                               XMLResourceIdentifier id,
                               String encoding,
                               Augmentations augs)
                        throws XNIException
Start entity.

Specified by:
startGeneralEntity in interface XMLDocumentHandler
Throws:
XNIException

textDecl

public void textDecl(String version,
                     String encoding,
                     Augmentations augs)
              throws XNIException
Text declaration.

Specified by:
textDecl in interface XMLDocumentHandler
Throws:
XNIException

endGeneralEntity

public void endGeneralEntity(String name,
                             Augmentations augs)
                      throws XNIException
End entity.

Specified by:
endGeneralEntity in interface XMLDocumentHandler
Throws:
XNIException

startCDATA

public void startCDATA(Augmentations augs)
                throws XNIException
Start CDATA section.

Specified by:
startCDATA in interface XMLDocumentHandler
Throws:
XNIException

endCDATA

public void endCDATA(Augmentations augs)
              throws XNIException
End CDATA section.

Specified by:
endCDATA in interface XMLDocumentHandler
Throws:
XNIException

characters

public void characters(XMLString text,
                       Augmentations augs)
                throws XNIException
Characters.

Specified by:
characters in interface XMLDocumentHandler
Throws:
XNIException

ignorableWhitespace

public void ignorableWhitespace(XMLString text,
                                Augmentations augs)
                         throws XNIException
Ignorable whitespace.

Specified by:
ignorableWhitespace in interface XMLDocumentHandler
Throws:
XNIException

endElement

public void endElement(QName element,
                       Augmentations augs)
                throws XNIException
End element.

Specified by:
endElement in interface XMLDocumentHandler
Throws:
XNIException

setDocumentSource

public void setDocumentSource(XMLDocumentSource source)
Sets the document source.

Specified by:
setDocumentSource in interface XMLDocumentHandler

getDocumentSource

public XMLDocumentSource getDocumentSource()
Returns the document source.

Specified by:
getDocumentSource in interface XMLDocumentHandler

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          Augmentations augs)
                   throws XNIException
Start document.

Throws:
XNIException

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri,
                               Augmentations augs)
                        throws XNIException
Start prefix mapping.

Throws:
XNIException

endPrefixMapping

public void endPrefixMapping(String prefix,
                             Augmentations augs)
                      throws XNIException
End prefix mapping.

Throws:
XNIException

getElement

protected HTMLElements.Element getElement(String name)
Returns an HTML element.


callStartElement

protected final void callStartElement(QName element,
                                      XMLAttributes attrs,
                                      Augmentations augs)
                               throws XNIException
Call document handler start element.

Throws:
XNIException

callEndElement

protected final void callEndElement(QName element,
                                    Augmentations augs)
                             throws XNIException
Call document handler end element.

Throws:
XNIException

getElementDepth

protected final int getElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.

Parameters:
element - The element.

getParentDepth

protected int getParentDepth(HTMLElements.Element[] parents,
                             short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.

Parameters:
parents - The parent elements.

emptyAttributes

protected final XMLAttributes emptyAttributes()
Returns a set of empty attributes.


synthesizedAugs

protected final Augmentations synthesizedAugs()
Returns an augmentations object with a synthesized item added.


modifyName

protected static final String modifyName(String name,
                                         short mode)
Modifies the given name based on the specified mode.


getNamesValue

protected static final short getNamesValue(String value)
Converts HTML names string value to constant value.

See Also:
NAMES_NO_CHANGE, NAMES_LOWERCASE, NAMES_UPPERCASE


(C) Copyright 2002-2004, Andy Clark. All rights reserved.