public final class HTMLParser extends Object
HTMLConfiguration
to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.
Note that the parser currently does not handle CDATA or comment sections, i.e. these do not appear in the resulting DOM tree
Modifier and Type | Method and Description |
---|---|
static IElementFactory |
getFactory(String tagName) |
static boolean |
getIgnoreOutsideContent()
Get the state of the flag to ignore content outside the BODY and HTML tags
|
static HtmlPage |
parse(WebResponse webResponse,
WebWindow webWindow)
parse the HTML content from the given WebResponse into an object tree representation
|
static void |
parseFragment(DomNode parent,
String source)
Parses the HTML content from the given string into an object tree representation.
|
static void |
setIgnoreOutsideContent(boolean ignoreOutsideContent)
Set the flag to control validation of the HTML content that is outside of the
BODY and HTML tags.
|
public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
ignoreOutsideContent
- - boolean flag to setpublic static boolean getIgnoreOutsideContent()
public static IElementFactory getFactory(String tagName)
tagName
- an HTML element tag namepublic static void parseFragment(DomNode parent, String source) throws SAXException, IOException
parent
- the parent for the new nodessource
- the (X)HTML to be parsedSAXException
- if a SAX error occursIOException
- if an IO error occurspublic static HtmlPage parse(WebResponse webResponse, WebWindow webWindow) throws IOException
webResponse
- the response datawebWindow
- the web window into which the page is to be loadednull
if the <HTML>
tag is missingIOException
- io errorCopyright © 2002-2012 Gargoyle Software Inc.. All Rights Reserved.