|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.w3c.tidy.EncodingUtils
Nested Class Summary | |
(package private) static interface |
EncodingUtils.GetBytes
Getter callback: called to retrieve 1 or more additional UTF-8 bytes. |
(package private) static interface |
EncodingUtils.PutBytes
Putter callbacks: called to store 1 or more additional UTF-8 bytes. |
Field Summary | |
static int |
FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. |
static int |
FSM_ESC
state ESC. |
static int |
FSM_ESCD
state ESCD. |
static int |
FSM_ESCDP
state ESCDP. |
static int |
FSM_ESCP
state ESCP. |
static int |
FSM_NONASCII
state NONASCII. |
static int |
HIGH_UTF16_SURROGATE
UTF-16 high surrogate. |
static int |
LOW_UTF16_SURROGATE
utf16 low surrogate. |
private static int[] |
MAC2UNICODE
John Love-Jensen contributed this table for mapping MacRoman character set to Unicode. |
static int |
MAX_UTF16_FROM_UCS4
Max UTF-16 value. |
static int |
MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value. |
private static int |
NUM_UTF8_SEQUENCES
number of valid utf8 sequances. |
private static int[] |
OFFSET_UTF8_SEQUENCES
Offset for utf8 sequences. |
private static int[] |
SYMBOL2UNICODE
table to map symbol font characters to Unicode; undefined characters are mapped to 0x0000 and characters without any unicode equivalent are mapped to '?'. |
static int |
UNICODE_BOM
the default (big-endian) UNICODE BOM. |
static int |
UNICODE_BOM_BE
the big-endian (default) UNICODE BOM. |
static int |
UNICODE_BOM_LE
the little-endian UNICODE BOM. |
static int |
UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM. |
static int |
UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin. |
static int |
UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end. |
static int |
UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin. |
static int |
UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end. |
static int |
UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin. |
private static int |
UTF8_BYTE_SWAP_NOT_A_CHAR
UTF-8 bye swap: invalid char. |
private static int |
UTF8_NOT_A_CHAR
UTF-8 invalid char. |
private static ValidUTF8Sequence[] |
VALID_UTF8
Array of valid UTF8 sequences. |
private static int[] |
WIN2UNICODE
Mapping for Windows Western character set (128-159) to Unicode. |
Constructor Summary | |
private |
EncodingUtils()
don't instantiate. |
Method Summary | |
protected static int |
decodeMacRoman(int c)
Function to convert from MacRoman to Unicode. |
(package private) static int |
decodeSymbolFont(int c)
Function to convert from Symbol Font chars to Unicode. |
(package private) static boolean |
decodeUTF8BytesToChar(int[] c,
int firstByte,
byte[] successorBytes,
EncodingUtils.GetBytes getter,
int[] count,
int startInSuccessorBytesArray)
Decodes an array of bytes to a char. |
protected static int |
decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode. |
(package private) static boolean |
encodeCharToUTF8Bytes(int c,
byte[] encodebuf,
EncodingUtils.PutBytes putter,
int[] count)
Encode a char to an array of bytes. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int UNICODE_BOM_BE
public static final int UNICODE_BOM
public static final int UNICODE_BOM_LE
public static final int UNICODE_BOM_UTF8
public static final int FSM_ASCII
public static final int FSM_ESC
public static final int FSM_ESCD
public static final int FSM_ESCDP
public static final int FSM_ESCP
public static final int FSM_NONASCII
public static final int MAX_UTF8_FROM_UCS4
public static final int MAX_UTF16_FROM_UCS4
public static final int LOW_UTF16_SURROGATE
public static final int UTF16_SURROGATES_BEGIN
public static final int UTF16_LOW_SURROGATE_BEGIN
public static final int UTF16_LOW_SURROGATE_END
public static final int UTF16_HIGH_SURROGATE_BEGIN
public static final int UTF16_HIGH_SURROGATE_END
public static final int HIGH_UTF16_SURROGATE
private static final int UTF8_BYTE_SWAP_NOT_A_CHAR
private static final int UTF8_NOT_A_CHAR
private static final int[] WIN2UNICODE
private static final int[] MAC2UNICODE
private static final int[] SYMBOL2UNICODE
private static final ValidUTF8Sequence[] VALID_UTF8
private static final int NUM_UTF8_SEQUENCES
private static final int[] OFFSET_UTF8_SEQUENCES
Constructor Detail |
private EncodingUtils()
Method Detail |
protected static int decodeWin1252(int c)
c
- char to decode
protected static int decodeMacRoman(int c)
c
- char to decode
static int decodeSymbolFont(int c)
c
- char to decode
static boolean decodeUTF8BytesToChar(int[] c, int firstByte, byte[] successorBytes, EncodingUtils.GetBytes getter, int[] count, int startInSuccessorBytesArray)
c
- will contain the decoded charfirstByte
- first input bytesuccessorBytes
- array containing successor bytes (can be null if a getter is provided).getter
- callback used to get new bytes if successorBytes doesn't contain enough bytescount
- will contain the number of bytes readstartInSuccessorBytesArray
- starting offset for bytes in successorBytes
true
if errorstatic boolean encodeCharToUTF8Bytes(int c, byte[] encodebuf, EncodingUtils.PutBytes putter, int[] count)
c
- char to encodeencodebuf
- will contain the decoded bytesputter
- if not null it will be called to write bytes to outcount
- number of bytes written
false
= ok, true
= error
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |