com.ibm.icu.text

Class IDNA

public final class IDNA extends Object

IDNA API implements the IDNA protocol as defined in the IDNA RFC. The draft defines 2 operations: ToASCII and ToUnicode. Domain labels containing non-ASCII code points are required to be processed by ToASCII operation before passing it to resolver libraries. Domain names that are obtained from resolver libraries are required to be processed by ToUnicode operation before displaying the domain name to the user. IDNA requires that implementations process input strings with Nameprep, which is a profile of Stringprep , and then with Punycode. Implementations of IDNA MUST fully implement Nameprep and Punycode; neither Nameprep nor Punycode are optional. The input and output of ToASCII and ToUnicode operations are Unicode and are designed to be chainable, i.e., applying ToASCII or ToUnicode operations multiple times to an input string will yield the same result as applying the operation once. ToUnicode(ToUnicode(ToUnicode...(ToUnicode(string)))) == ToUnicode(string) ToASCII(ToASCII(ToASCII...(ToASCII(string))) == ToASCII(string).

Author: Ram Viswanadha

UNKNOWN: ICU 2.8

Field Summary
static intALLOW_UNASSIGNED
Option to allow processing of unassigned codepoints in the input
static intDEFAULT
Option to prohibit processing of unassigned codepoints in the input and do not check if the input conforms to STD-3 ASCII rules.
static intUSE_STD3_RULES
Option to check if input conforms to STD-3 ASCII rules
Method Summary
static intcompare(StringBuffer s1, StringBuffer s2, int options)
Compare two IDN strings for equivalence.
static intcompare(String s1, String s2, int options)
Compare two IDN strings for equivalence.
static intcompare(UCharacterIterator s1, UCharacterIterator s2, int options)
Compare two IDN strings for equivalence.
static StringBufferconvertIDNToASCII(UCharacterIterator src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
static StringBufferconvertIDNToASCII(StringBuffer src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
static StringBufferconvertIDNToASCII(String src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
static StringBufferconvertIDNToUnicode(UCharacterIterator src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
static StringBufferconvertIDNToUnicode(StringBuffer src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
static StringBufferconvertIDNToUnicode(String src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
static StringBufferconvertToASCII(String src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC.
static StringBufferconvertToASCII(StringBuffer src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC.
static StringBufferconvertToASCII(UCharacterIterator src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC.
static StringBufferconvertToUnicode(String src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC.
static StringBufferconvertToUnicode(StringBuffer src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC.
static StringBufferconvertToUnicode(UCharacterIterator src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC.

Field Detail

ALLOW_UNASSIGNED

public static final int ALLOW_UNASSIGNED
Option to allow processing of unassigned codepoints in the input

See Also: #convertToUnicode

UNKNOWN: ICU 2.8

DEFAULT

public static final int DEFAULT
Option to prohibit processing of unassigned codepoints in the input and do not check if the input conforms to STD-3 ASCII rules.

See Also: #convertToUnicode

UNKNOWN: ICU 2.8

USE_STD3_RULES

public static final int USE_STD3_RULES
Option to check if input conforms to STD-3 ASCII rules

See Also: #convertToUnicode

UNKNOWN: ICU 2.8

Method Detail

compare

public static int compare(StringBuffer s1, StringBuffer s2, int options)
Compare two IDN strings for equivalence. This function splits the domain names into labels and compares them. According to IDN RFC, whenever two labels are compared, they are considered equal if and only if their ASCII forms (obtained by applying toASCII) match using an case-insensitive ASCII comparison. Two domain names are considered a match if and only if all labels match regardless of whether label separators match.

Parameters: s1 First IDN string as StringBuffer s2 Second IDN string as StringBuffer options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: 0 if the strings are equal, > 0 if s1 > s2 and < 0 if s1 < s2

Throws: ParseException

UNKNOWN: ICU 2.8

compare

public static int compare(String s1, String s2, int options)
Compare two IDN strings for equivalence. This function splits the domain names into labels and compares them. According to IDN RFC, whenever two labels are compared, they are considered equal if and only if their ASCII forms (obtained by applying toASCII) match using an case-insensitive ASCII comparison. Two domain names are considered a match if and only if all labels match regardless of whether label separators match.

Parameters: s1 First IDN string s2 Second IDN string options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: 0 if the strings are equal, > 0 if s1 > s2 and < 0 if s1 < s2

Throws: ParseException

UNKNOWN: ICU 2.8

compare

public static int compare(UCharacterIterator s1, UCharacterIterator s2, int options)
Compare two IDN strings for equivalence. This function splits the domain names into labels and compares them. According to IDN RFC, whenever two labels are compared, they are considered equal if and only if their ASCII forms (obtained by applying toASCII) match using an case-insensitive ASCII comparison. Two domain names are considered a match if and only if all labels match regardless of whether label separators match.

Parameters: s1 First IDN string as UCharacterIterator s2 Second IDN string as UCharacterIterator options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: 0 if the strings are equal, > 0 if i1 > i2 and < 0 if i1 < i2

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToASCII

public static StringBuffer convertIDNToASCII(UCharacterIterator src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". It is important to note that this operation can fail. If it fails, then the input domain name cannot be used as an Internationalized Domain Name and the application should have methods defined to deal with the failure. Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string as UCharacterIterator to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToASCII

public static StringBuffer convertIDNToASCII(StringBuffer src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". It is important to note that this operation can fail. If it fails, then the input domain name cannot be used as an Internationalized Domain Name and the application should have methods defined to deal with the failure. Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string as a StringBuffer to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToASCII

public static StringBuffer convertIDNToASCII(String src, int options)
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". It is important to note that this operation can fail. If it fails, then the input domain name cannot be used as an Internationalized Domain Name and the application should have methods defined to deal with the failure. Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToUnicode

public static StringBuffer convertIDNToUnicode(UCharacterIterator src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string as UCharacterIterator to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToUnicode

public static StringBuffer convertIDNToUnicode(StringBuffer src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string as StringBuffer to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertIDNToUnicode

public static StringBuffer convertIDNToUnicode(String src, int options)
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC. This operation is done on complete domain names, e.g: "www.example.com". Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters: src The input string to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToASCII

public static StringBuffer convertToASCII(String src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects ASCII names. A label is an individual part of a domain name. Labels are usually separated by dots; e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToASCII

public static StringBuffer convertToASCII(StringBuffer src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects ASCII names. A label is an individual part of a domain name. Labels are usually separated by dots; e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string as StringBuffer to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToASCII

public static StringBuffer convertToASCII(UCharacterIterator src, int options)
This function implements the ToASCII operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects ASCII names. A label is an individual part of a domain name. Labels are usually separated by dots; e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string as UCharacterIterator to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToUnicode

public static StringBuffer convertToUnicode(String src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects Unicode names. A label is an individual part of a domain name. Labels are usually separated by dots; for e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToUnicode

public static StringBuffer convertToUnicode(StringBuffer src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects Unicode names. A label is an individual part of a domain name. Labels are usually separated by dots; for e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string as StringBuffer to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

convertToUnicode

public static StringBuffer convertToUnicode(UCharacterIterator src, int options)
This function implements the ToUnicode operation as defined in the IDNA RFC. This operation is done on single labels before sending it to something that expects Unicode names. A label is an individual part of a domain name. Labels are usually separated by dots; for e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters: src The input string as UCharacterIterator to be processed options A bit set of options: - IDNA.DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with ParseException. - IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. - IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with ParseException

Returns: StringBuffer the converted String

Throws: ParseException

UNKNOWN: ICU 2.8

Copyright (c) 2007 IBM Corporation and others.