org.htmlparser.beans

Class FilterBean

public class FilterBean extends Object implements Serializable

Extract nodes from a URL using a filter.
 
     FilterBean fb = new FilterBean ("http://cbc.ca");
     fb.setFilters (new NodeFilter[] { new TagNameFilter ("META") });
     fb.setURL ("http://cbc.ca");
     System.out.println (fb.getNodes ().toHtml ());
 
 
Field Summary
protected NodeFilter[]mFilters
The filter set.
protected NodeListmNodes
The nodes extracted from the URL.
protected ParsermParser
The parser used to filter.
protected PropertyChangeSupportmPropertySupport
Bound property support.
protected booleanmRecursive
The recursion behaviour for elements of the filter array.
static StringPROP_CONNECTION_PROPERTY
Property name in event where the connection changes.
static StringPROP_NODES_PROPERTY
Property name in event where the URL contents changes.
static StringPROP_TEXT_PROPERTY
Property name in event where the URL contents changes.
static StringPROP_URL_PROPERTY
Property name in event where the URL changes.
Constructor Summary
FilterBean()
Create a FilterBean object.
Method Summary
voidaddPropertyChangeListener(PropertyChangeListener listener)
Add a PropertyChangeListener to the listener list.
protected NodeListapplyFilters()
Apply each of the filters.
URLConnectiongetConnection()
Get the current connection.
NodeFilter[]getFilters()
Get the current filter set.
NodeListgetNodes()
Return the nodes of the URL matching the filter.
ParsergetParser()
Get the parser used to fetch nodes.
booleangetRecursive()
Get the current recursion behaviour.
StringgetText()
Convenience method to apply a StringBean to the filter results.
StringgetURL()
Get the current URL.
static voidmain(String[] args)
Unit test.
voidremovePropertyChangeListener(PropertyChangeListener listener)
Remove a PropertyChangeListener from the listener list.
voidsetConnection(URLConnection connection)
Set the parser's connection.
voidsetFilters(NodeFilter[] filters)
Set the filters for the bean.
protected voidsetNodes()
Fetch the URL contents and filter it.
voidsetParser(Parser parser)
Set the parser for the bean.
voidsetRecursive(boolean recursive)
Set the recursion behaviour.
voidsetURL(String url)
Set the URL to extract strings from.
protected voidupdateNodes(NodeList nodes)
Assign the Nodes property, firing the property change.

Field Detail

mFilters

protected NodeFilter[] mFilters
The filter set.

mNodes

protected NodeList mNodes
The nodes extracted from the URL.

mParser

protected Parser mParser
The parser used to filter.

mPropertySupport

protected PropertyChangeSupport mPropertySupport
Bound property support.

mRecursive

protected boolean mRecursive
The recursion behaviour for elements of the filter array. If true the filters are applied recursively.

See Also: .

PROP_CONNECTION_PROPERTY

public static final String PROP_CONNECTION_PROPERTY
Property name in event where the connection changes.

PROP_NODES_PROPERTY

public static final String PROP_NODES_PROPERTY
Property name in event where the URL contents changes.

PROP_TEXT_PROPERTY

public static final String PROP_TEXT_PROPERTY
Property name in event where the URL contents changes.

PROP_URL_PROPERTY

public static final String PROP_URL_PROPERTY
Property name in event where the URL changes.

Constructor Detail

FilterBean

public FilterBean()
Create a FilterBean object.

Method Detail

addPropertyChangeListener

public void addPropertyChangeListener(PropertyChangeListener listener)
Add a PropertyChangeListener to the listener list. The listener is registered for all properties.

Parameters: listener The PropertyChangeListener to be added.

applyFilters

protected NodeList applyFilters()
Apply each of the filters. The first filter is applied to the output of the parser. Subsequent filters are applied to the output of the prior filter.

Returns: A list of nodes passed through all filters. If there are no filters, returns the entire page.

Throws: ParserException If an encoding change occurs or there is some other problem.

getConnection

public URLConnection getConnection()
Get the current connection.

Returns: The connection that the parser has or null if it hasn't been set or the parser hasn't been constructed yet.

getFilters

public NodeFilter[] getFilters()
Get the current filter set.

Returns: The current filters.

getNodes

public NodeList getNodes()
Return the nodes of the URL matching the filter. This is the primary output of the bean.

Returns: The nodes from the URL matching the current filter.

getParser

public Parser getParser()
Get the parser used to fetch nodes.

Returns: The parser used by the bean.

getRecursive

public boolean getRecursive()
Get the current recursion behaviour.

Returns: The recursion (applies to children, children's children, etc) behavior currently being used.

getText

public String getText()
Convenience method to apply a StringBean to the filter results. This may yield duplicate or multiple text elements if the node list contains nodes from two or more levels in the same nested tag heirarchy, but if the node list contains only one tag, it provides access to the text within the node.

Returns: The textual contents of the nodes that pass through the filter set, as collected by the StringBean.

getURL

public String getURL()
Get the current URL.

Returns: The URL from which text has been extracted, or null if this property has not been set yet.

main

public static void main(String[] args)
Unit test.

Parameters: args Pass arg[0] as the URL to process, and optionally a node name for filtering.

removePropertyChangeListener

public void removePropertyChangeListener(PropertyChangeListener listener)
Remove a PropertyChangeListener from the listener list. This removes a registered PropertyChangeListener.

Parameters: listener The PropertyChangeListener to be removed.

setConnection

public void setConnection(URLConnection connection)
Set the parser's connection. The text from the URL will be fetched, which may be expensive, so this property should be set last.

Parameters: connection New value of property Connection.

setFilters

public void setFilters(NodeFilter[] filters)
Set the filters for the bean. If the parser has been set, it is reset and the nodes are refetched with the new filters.

Parameters: filters The filter set to use.

setNodes

protected void setNodes()
Fetch the URL contents and filter it. Only do work if there is a valid parser with it's URL set.

setParser

public void setParser(Parser parser)
Set the parser for the bean. The parser is used immediately to fetch the nodes, which for a null filter means all the nodes

Parameters: parser The parser to use.

setRecursive

public void setRecursive(boolean recursive)
Set the recursion behaviour.

Parameters: recursive If true the extractAllNodesThatMatch() call is performed recursively.

See Also: .

setURL

public void setURL(String url)
Set the URL to extract strings from. The text from the URL will be fetched, which may be expensive, so this property should be set last.

Parameters: url The URL that text should be fetched from.

updateNodes

protected void updateNodes(NodeList nodes)
Assign the Nodes property, firing the property change.

Parameters: nodes The new value of the Nodes property.

HTML Parser is an open source library released under LGPL. SourceForge.net