org.w3c.tidy
Class TidyUtils

java.lang.Object
  extended byorg.w3c.tidy.TidyUtils

public final class TidyUtils
extends java.lang.Object

Utility class with handy methods, mainly for String handling or for reproducing c behaviours.

Version:
$Revision $ ($Author $)
Author:
Fabrizio Giustina

Field Summary
private static short DIGIT
          char type: digit.
private static short LETTER
          char type: letter.
private static short[] lexmap
          used to classify chars for lexical purposes.
private static short LOWERCASE
          char type: lowercase.
private static short NAMECHAR
          char type: namechar.
private static short NEWLINE
          char type: newline.
private static short UPPERCASE
          char type: uppercase.
private static short WHITE
          char type: whitespace.
 
Constructor Summary
private TidyUtils()
          utility class, don't instantiate.
 
Method Summary
static boolean findBadSubString(java.lang.String s, java.lang.String p, int len)
          Return true if substring s is in p and isn't all in upper case.
static char foldCase(char c, boolean tocaps, boolean xmlTags)
          Fold case of a char.
static byte[] getBytes(java.lang.String str)
          Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
static java.lang.String getString(byte[] bytes, int offset, int length)
          Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
static boolean isCharEncodingSupported(java.lang.String name)
          Is the given character encoding supported?
static boolean isDigit(char c)
          Is the given char a digit?
(package private) static boolean isInValuesIgnoreCase(java.lang.String[] validValues, java.lang.String valueToCheck)
          Check if the string valueToCheck is contained in validValues array (case insesitie comparison).
static boolean isLetter(char c)
          Is the given char a letter?
static boolean isLower(char c)
          Determines if the specified character is a lowercase character.
static boolean isNamechar(char c)
          Is the given char valid in name? (letter, digit or "-", ".", ":", "_")
(package private) static boolean isQuote(int c)
          Is the given character a single or double quote?
static boolean isUpper(char c)
          Determines if the specified character is a uppercase character.
static boolean isWhite(char c)
          Determines if the specified character is whitespace.
(package private) static boolean isxdigit(char c)
          Is the character a hex digit?
(package private) static boolean isXMLLetter(char c)
          Is the given char a valid xml letter?
(package private) static boolean isXMLNamechar(char c)
          Is the given char valid in xml name?
static int lastChar(java.lang.String str)
          Return the last char in string.
private static short map(char c)
          Returns the constant which defines the classification of char in lexmap.
private static void mapStr(java.lang.String str, short code)
          Classify chars in String and put them in lexmap.
(package private) static boolean toBoolean(int value)
          Converts a int to a boolean.
static char toLower(char c)
          Maps the given character to its lowercase equivalent.
(package private) static int toUnsigned(int c)
          convert an int to unsigned (& 0xFF).
static char toUpper(char c)
          Maps the given character to its uppercase equivalent.
(package private) static int wstrnchr(java.lang.String s1, int len1, char cc)
          return offset of cc from beginning of s1, -1 if not found.
(package private) static boolean wsubstr(java.lang.String s1, java.lang.String s2)
          Same as wsubstrn, but without a specified length.
(package private) static boolean wsubstrn(java.lang.String s1, int len1, java.lang.String s2)
          check if the first String contains the second one.
(package private) static boolean wsubstrncase(java.lang.String s1, int len1, java.lang.String s2)
          check if the first String contains the second one (ignore case).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DIGIT

private static final short DIGIT
char type: digit.

See Also:
Constant Field Values

LETTER

private static final short LETTER
char type: letter.

See Also:
Constant Field Values

NAMECHAR

private static final short NAMECHAR
char type: namechar.

See Also:
Constant Field Values

WHITE

private static final short WHITE
char type: whitespace.

See Also:
Constant Field Values

NEWLINE

private static final short NEWLINE
char type: newline.

See Also:
Constant Field Values

LOWERCASE

private static final short LOWERCASE
char type: lowercase.

See Also:
Constant Field Values

UPPERCASE

private static final short UPPERCASE
char type: uppercase.

See Also:
Constant Field Values

lexmap

private static short[] lexmap
used to classify chars for lexical purposes.

Constructor Detail

TidyUtils

private TidyUtils()
utility class, don't instantiate.

Method Detail

toBoolean

static boolean toBoolean(int value)
Converts a int to a boolean.

Parameters:
value - int value
Returns:
true if value is != 0

toUnsigned

static int toUnsigned(int c)
convert an int to unsigned (& 0xFF).

Parameters:
c - signed int
Returns:
unsigned int

wsubstrn

static boolean wsubstrn(java.lang.String s1,
                        int len1,
                        java.lang.String s2)
check if the first String contains the second one.

Parameters:
s1 - full String
len1 - maximum position in String
s2 - String to search for
Returns:
true if s1 contains s2 in the range 0-len1

wsubstrncase

static boolean wsubstrncase(java.lang.String s1,
                            int len1,
                            java.lang.String s2)
check if the first String contains the second one (ignore case).

Parameters:
s1 - full String
len1 - maximum position in String
s2 - String to search for
Returns:
true if s1 contains s2 in the range 0-len1

wstrnchr

static int wstrnchr(java.lang.String s1,
                    int len1,
                    char cc)
return offset of cc from beginning of s1, -1 if not found.

Parameters:
s1 - String
len1 - maximum offset (values > than lenl are ignored and returned as -1)
cc - character to search for
Returns:
index of cc in s1

wsubstr

static boolean wsubstr(java.lang.String s1,
                       java.lang.String s2)
Same as wsubstrn, but without a specified length.

Parameters:
s1 - full String
s2 - String to search for
Returns:
true if s2 is found in s2 (case insensitive search)

isxdigit

static boolean isxdigit(char c)
Is the character a hex digit?

Parameters:
c - char
Returns:
true if he given character is a hex digit

isInValuesIgnoreCase

static boolean isInValuesIgnoreCase(java.lang.String[] validValues,
                                    java.lang.String valueToCheck)
Check if the string valueToCheck is contained in validValues array (case insesitie comparison).

Parameters:
validValues - array of valid values
valueToCheck - value to search for
Returns:
true if valueToCheck is found in validValues

findBadSubString

public static boolean findBadSubString(java.lang.String s,
                                       java.lang.String p,
                                       int len)
Return true if substring s is in p and isn't all in upper case. This is used to check the case of SYSTEM, PUBLIC, DTD and EN.

Parameters:
s - substring
p - full string
len - how many chars to check in p
Returns:
true if substring s is in p and isn't all in upper case

isXMLLetter

static boolean isXMLLetter(char c)
Is the given char a valid xml letter?

Parameters:
c - char
Returns:
true if the char is a valid xml letter

isXMLNamechar

static boolean isXMLNamechar(char c)
Is the given char valid in xml name?

Parameters:
c - char
Returns:
true if the char is a valid xml name char

isQuote

static boolean isQuote(int c)
Is the given character a single or double quote?

Parameters:
c - char
Returns:
true if c is " or '

getBytes

public static byte[] getBytes(java.lang.String str)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.

Parameters:
str - String
Returns:
utf8 bytes
See Also:
String.getBytes()

getString

public static java.lang.String getString(byte[] bytes,
                                         int offset,
                                         int length)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.

Parameters:
bytes - byte array
offset - starting offset in byte array
length - length in byte array starting from offset
Returns:
same as new String(bytes, offset, length, "UTF8")

lastChar

public static int lastChar(java.lang.String str)
Return the last char in string. This is useful when trailing quotemark is missing on an attribute

Parameters:
str - String
Returns:
last char in String

isWhite

public static boolean isWhite(char c)
Determines if the specified character is whitespace.

Parameters:
c - char
Returns:
true if char is whitespace.

isDigit

public static boolean isDigit(char c)
Is the given char a digit?

Parameters:
c - char
Returns:
true if the given char is a digit

isLetter

public static boolean isLetter(char c)
Is the given char a letter?

Parameters:
c - char
Returns:
true if the given char is a letter

isNamechar

public static boolean isNamechar(char c)
Is the given char valid in name? (letter, digit or "-", ".", ":", "_")

Parameters:
c - char
Returns:
true if char is a name char.

isLower

public static boolean isLower(char c)
Determines if the specified character is a lowercase character.

Parameters:
c - char
Returns:
true if char is lower case.

isUpper

public static boolean isUpper(char c)
Determines if the specified character is a uppercase character.

Parameters:
c - char
Returns:
true if char is upper case.

toLower

public static char toLower(char c)
Maps the given character to its lowercase equivalent.

Parameters:
c - char
Returns:
lowercase char.

toUpper

public static char toUpper(char c)
Maps the given character to its uppercase equivalent.

Parameters:
c - char
Returns:
uppercase char.

foldCase

public static char foldCase(char c,
                            boolean tocaps,
                            boolean xmlTags)
Fold case of a char.

Parameters:
c - char
tocaps - convert to caps
xmlTags - use xml tags? If true no change will be performed
Returns:
folded char

mapStr

private static void mapStr(java.lang.String str,
                           short code)
Classify chars in String and put them in lexmap.

Parameters:
str - String
code - code associated to chars in the String

map

private static short map(char c)
Returns the constant which defines the classification of char in lexmap.

Parameters:
c - char
Returns:
char type

isCharEncodingSupported

public static boolean isCharEncodingSupported(java.lang.String name)
Is the given character encoding supported?

Parameters:
name - character encoding name
Returns:
true if encoding is supported, false otherwhise.