org.cyberneko.html
Class HTMLConfiguration

java.lang.Object
  extended byorg.apache.xerces.util.ParserConfigurationSettings
      extended byorg.cyberneko.html.HTMLConfiguration
All Implemented Interfaces:
XMLComponentManager, XMLParserConfiguration, XMLPullParserConfiguration

public class HTMLConfiguration
extends ParserConfigurationSettings
implements XMLPullParserConfiguration

An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

This configuration recognizes the following properties:

For complete usage information, refer to the documentation.

Version:
$Id$
Author:
Andy Clark
See Also:
HTMLScanner, HTMLTagBalancer, HTMLErrorReporter

Nested Class Summary
protected  class HTMLConfiguration.ErrorReporter
          Defines an error reporter for reporting HTML errors.
 
Field Summary
protected static String AUGMENTATIONS
          Include infoset augmentations.
protected static String BALANCE_TAGS
          Balance tags.
protected static String ERROR_DOMAIN
          Error domain.
protected static String ERROR_REPORTER
          Error reporter.
protected  boolean fCloseStream
          Stream opened by parser.
protected  XMLDocumentHandler fDocumentHandler
          Document handler.
protected  HTMLScanner fDocumentScanner
          Document scanner.
protected  XMLDTDContentModelHandler fDTDContentModelHandler
          DTD content model handler.
protected  XMLDTDHandler fDTDHandler
          DTD handler.
protected  XMLEntityResolver fEntityResolver
          Entity resolver.
protected  XMLErrorHandler fErrorHandler
          Error handler.
protected  HTMLErrorReporter fErrorReporter
          Error reporter.
protected  Vector fHTMLComponents
          Components.
protected static String FILTERS
          Pipeline filters.
protected  Locale fLocale
          Locale.
protected  NamespaceBinder fNamespaceBinder
          Namespace binder.
protected  HTMLTagBalancer fTagBalancer
          HTML tag balancer.
protected static String NAMES_ATTRS
          Modify HTML attribute names: { "upper", "lower", "default" }.
protected static String NAMES_ELEMS
          Modify HTML element names: { "upper", "lower", "default" }.
protected static String NAMESPACES
          Namespaces.
protected static String REPORT_ERRORS
          Report errors.
protected static String SIMPLE_ERROR_FORMAT
          Simple report format.
protected static boolean XERCES_2_0_0
          Parser version is Xerces 2.0.0.
protected static boolean XERCES_2_0_1
          Parser version is Xerces 2.0.1.
protected static boolean XML4J_4_0_x
          Parser version is XML4J 4.0.x.
 
Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings
fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties, PARSER_SETTINGS
 
Constructor Summary
HTMLConfiguration()
          Default constructor.
 
Method Summary
protected  void addComponent(HTMLComponent component)
          Adds a component.
 void cleanup()
          If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
 XMLDocumentHandler getDocumentHandler()
          Returns the document handler.
 XMLDTDContentModelHandler getDTDContentModelHandler()
          Returns the DTD content model handler.
 XMLDTDHandler getDTDHandler()
          Returns the DTD handler.
 XMLEntityResolver getEntityResolver()
          Returns the entity resolver.
 XMLErrorHandler getErrorHandler()
          Returns the error handler.
 Locale getLocale()
          Returns the locale.
 boolean parse(boolean complete)
          Parses the document in a pull parsing fashion.
 void parse(XMLInputSource source)
          Parses a document.
 void pushInputSource(XMLInputSource inputSource)
          Pushes an input source onto the current entity stack.
protected  void reset()
          Resets the parser configuration.
 void setDocumentHandler(XMLDocumentHandler handler)
          Sets the document handler.
 void setDTDContentModelHandler(XMLDTDContentModelHandler handler)
          Sets the DTD content model handler.
 void setDTDHandler(XMLDTDHandler handler)
          Sets the DTD handler.
 void setEntityResolver(XMLEntityResolver resolver)
          Sets the entity resolver.
 void setErrorHandler(XMLErrorHandler handler)
          Sets the error handler.
 void setFeature(String featureId, boolean state)
          Sets a feature.
 void setInputSource(XMLInputSource inputSource)
          Sets the input source for the document to parse.
 void setLocale(Locale locale)
          Sets the locale.
 void setProperty(String propertyId, Object value)
          Sets a property.
 
Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration
addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
 

Field Detail

NAMESPACES

protected static final String NAMESPACES
Namespaces.

See Also:
Constant Field Values

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

See Also:
Constant Field Values

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

See Also:
Constant Field Values

SIMPLE_ERROR_FORMAT

protected static final String SIMPLE_ERROR_FORMAT
Simple report format.

See Also:
Constant Field Values

BALANCE_TAGS

protected static final String BALANCE_TAGS
Balance tags.

See Also:
Constant Field Values

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

FILTERS

protected static final String FILTERS
Pipeline filters.

See Also:
Constant Field Values

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

See Also:
Constant Field Values

ERROR_DOMAIN

protected static final String ERROR_DOMAIN
Error domain.

See Also:
Constant Field Values

fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
Document handler.


fDTDHandler

protected XMLDTDHandler fDTDHandler
DTD handler.


fDTDContentModelHandler

protected XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.


fErrorHandler

protected XMLErrorHandler fErrorHandler
Error handler.


fEntityResolver

protected XMLEntityResolver fEntityResolver
Entity resolver.


fLocale

protected Locale fLocale
Locale.


fCloseStream

protected boolean fCloseStream
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.


fHTMLComponents

protected Vector fHTMLComponents
Components.


fDocumentScanner

protected HTMLScanner fDocumentScanner
Document scanner.


fTagBalancer

protected HTMLTagBalancer fTagBalancer
HTML tag balancer.


fNamespaceBinder

protected NamespaceBinder fNamespaceBinder
Namespace binder.


fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.


XERCES_2_0_0

protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.


XERCES_2_0_1

protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.


XML4J_4_0_x

protected static boolean XML4J_4_0_x
Parser version is XML4J 4.0.x.

Constructor Detail

HTMLConfiguration

public HTMLConfiguration()
Default constructor.

Method Detail

pushInputSource

public void pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

Parameters:
inputSource - The new input source to start scanning.

setFeature

public void setFeature(String featureId,
                       boolean state)
                throws XMLConfigurationException
Sets a feature.

Specified by:
setFeature in interface XMLParserConfiguration
Throws:
XMLConfigurationException

setProperty

public void setProperty(String propertyId,
                        Object value)
                 throws XMLConfigurationException
Sets a property.

Specified by:
setProperty in interface XMLParserConfiguration
Throws:
XMLConfigurationException

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

Specified by:
setDocumentHandler in interface XMLParserConfiguration

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

Specified by:
getDocumentHandler in interface XMLParserConfiguration

setDTDHandler

public void setDTDHandler(XMLDTDHandler handler)
Sets the DTD handler.

Specified by:
setDTDHandler in interface XMLParserConfiguration

getDTDHandler

public XMLDTDHandler getDTDHandler()
Returns the DTD handler.

Specified by:
getDTDHandler in interface XMLParserConfiguration

setDTDContentModelHandler

public void setDTDContentModelHandler(XMLDTDContentModelHandler handler)
Sets the DTD content model handler.

Specified by:
setDTDContentModelHandler in interface XMLParserConfiguration

getDTDContentModelHandler

public XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.

Specified by:
getDTDContentModelHandler in interface XMLParserConfiguration

setErrorHandler

public void setErrorHandler(XMLErrorHandler handler)
Sets the error handler.

Specified by:
setErrorHandler in interface XMLParserConfiguration

getErrorHandler

public XMLErrorHandler getErrorHandler()
Returns the error handler.

Specified by:
getErrorHandler in interface XMLParserConfiguration

setEntityResolver

public void setEntityResolver(XMLEntityResolver resolver)
Sets the entity resolver.

Specified by:
setEntityResolver in interface XMLParserConfiguration

getEntityResolver

public XMLEntityResolver getEntityResolver()
Returns the entity resolver.

Specified by:
getEntityResolver in interface XMLParserConfiguration

setLocale

public void setLocale(Locale locale)
Sets the locale.

Specified by:
setLocale in interface XMLParserConfiguration

getLocale

public Locale getLocale()
Returns the locale.

Specified by:
getLocale in interface XMLParserConfiguration

parse

public void parse(XMLInputSource source)
           throws XNIException,
                  IOException
Parses a document.

Specified by:
parse in interface XMLParserConfiguration
Throws:
XNIException
IOException

setInputSource

public void setInputSource(XMLInputSource inputSource)
                    throws XMLConfigurationException,
                           IOException
Sets the input source for the document to parse.

Specified by:
setInputSource in interface XMLPullParserConfiguration
Parameters:
inputSource - The document's input source.
Throws:
XMLConfigurationException - Thrown if there is a configuration error when initializing the parser.
IOException - Thrown on I/O error.
See Also:
parse(boolean)

parse

public boolean parse(boolean complete)
              throws XNIException,
                     IOException
Parses the document in a pull parsing fashion.

Specified by:
parse in interface XMLPullParserConfiguration
Parameters:
complete - True if the pull parser should parse the remaining document completely.
Returns:
True if there is more document to parse.
Throws:
XNIException - Any XNI exception, possibly wrapping another exception.
IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
See Also:
setInputSource(org.apache.xerces.xni.parser.XMLInputSource)

cleanup

public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.

Specified by:
cleanup in interface XMLPullParserConfiguration

addComponent

protected void addComponent(HTMLComponent component)
Adds a component.


reset

protected void reset()
              throws XMLConfigurationException
Resets the parser configuration.

Throws:
XMLConfigurationException


(C) Copyright 2002-2004, Andy Clark. All rights reserved.