com.quiotix.html.parser
Class HtmlScrubber

java.lang.Object
  |
  +--com.quiotix.html.parser.HtmlVisitor
        |
        +--com.quiotix.html.parser.HtmlScrubber

public class HtmlScrubber
extends HtmlVisitor

HtmlScrubber is a Visitor which walks an HtmlDocument and cleans it up. It can change tags and tag attributes to uppercase or lowercase, strip out unnecessary quotes from attribute values, and strip trailing spaces before a newline.

Author:
Brian Goetz, Quiotix Additional contributions by: Thorsten Weber

Field Summary
static int ATTR_DOWNCASE
           
static int ATTR_UPCASE
           
static int DEFAULT_OPTIONS
           
protected  int flags
           
protected  boolean inPreBlock
           
protected  HtmlDocument.HtmlElement previousElement
           
static int STRIP_QUOTES
           
static int TAGS_DOWNCASE
           
static int TAGS_UPCASE
           
static int TRIM_SPACES
           
 
Constructor Summary
HtmlScrubber()
          Create an HtmlScrubber with the default options (downcase tags and tag attributes, strip out unnecessary quotes.)
HtmlScrubber(int flags)
          Create an HtmlScrubber with the desired set of options.
 
Method Summary
 void start()
           
 void visit(HtmlDocument.Annotation a)
           
 void visit(HtmlDocument.Comment c)
           
 void visit(HtmlDocument.EndTag t)
           
 void visit(HtmlDocument.Newline n)
           
 void visit(HtmlDocument.Tag t)
           
 void visit(HtmlDocument.TagBlock bl)
           
 void visit(HtmlDocument.Text t)
           
 
Methods inherited from class com.quiotix.html.parser.HtmlVisitor
finish, visit, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TAGS_UPCASE

public static final int TAGS_UPCASE

TAGS_DOWNCASE

public static final int TAGS_DOWNCASE

ATTR_UPCASE

public static final int ATTR_UPCASE

ATTR_DOWNCASE

public static final int ATTR_DOWNCASE

STRIP_QUOTES

public static final int STRIP_QUOTES

TRIM_SPACES

public static final int TRIM_SPACES

DEFAULT_OPTIONS

public static final int DEFAULT_OPTIONS

flags

protected int flags

previousElement

protected HtmlDocument.HtmlElement previousElement

inPreBlock

protected boolean inPreBlock
Constructor Detail

HtmlScrubber

public HtmlScrubber()
Create an HtmlScrubber with the default options (downcase tags and tag attributes, strip out unnecessary quotes.)

HtmlScrubber

public HtmlScrubber(int flags)
Create an HtmlScrubber with the desired set of options.
Parameters:
flags - A bitmask representing the desired scrubbing options
Method Detail

start

public void start()
Overrides:
start in class HtmlVisitor

visit

public void visit(HtmlDocument.Tag t)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.EndTag t)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Text t)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Comment c)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Newline n)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Annotation a)
Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.TagBlock bl)
Overrides:
visit in class HtmlVisitor