com.gargoylesoftware.htmlunit.html
Class HTMLParser

java.lang.Object
  extended by com.gargoylesoftware.htmlunit.html.HTMLParser

public final class HTMLParser
extends Object

SAX parser implementation that uses the neko HTMLConfiguration to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.

Note that the parser currently does not handle CDATA or comment sections, i.e. these do not appear in the resulting DOM tree

Version:
$Revision: 2132 $
Author:
Christian Sell, David K. Taylor, Chris Erskine, Ahmed Ashour

Method Summary
static IElementFactory getFactory(String tagName)
           
static boolean getIgnoreOutsideContent()
          Get the state of the flag to ignore content outside the BODY and HTML tags
static HtmlPage parse(WebResponse webResponse, WebWindow webWindow)
          parse the HTML content from the given WebResponse into an object tree representation
static void parseFragment(DomNode parent, String source)
          Parses the HTML content from the given string into an object tree representation.
static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
          Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

setIgnoreOutsideContent

public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags. This flag is false by default to maintain compatibility with current NekoHTML defaults.

Parameters:
ignoreOutsideContent - - boolean flag to set

getIgnoreOutsideContent

public static boolean getIgnoreOutsideContent()
Get the state of the flag to ignore content outside the BODY and HTML tags

Returns:
- The current state

getFactory

public static IElementFactory getFactory(String tagName)
Parameters:
tagName - an HTML element tag name
Returns:
a factory for creating HtmlElements representing the given tag

parseFragment

public static void parseFragment(DomNode parent,
                                 String source)
                          throws SAXException,
                                 IOException
Parses the HTML content from the given string into an object tree representation.

Parameters:
parent - the parent for the new nodes
source - the (X)HTML to be parsed
Throws:
SAXException - if a SAX error occurs
IOException - if an IO error occurs

parse

public static HtmlPage parse(WebResponse webResponse,
                             WebWindow webWindow)
                      throws IOException
parse the HTML content from the given WebResponse into an object tree representation

Parameters:
webResponse - the response data
webWindow - the web window into which the page is to be loaded
Returns:
the page object which forms the root of the DOM tree, or null if the <HTML> tag is missing
Throws:
IOException - io error


Copyright © 2002-2010 Gargoyle Software Inc.. All Rights Reserved.