Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members   Search  

uidna.h File Reference

UIDNA API implements the IDNA protocol as defined in the IDNA RFC (http://www.ietf.org/rfc/rfc3490.txt). More...

#include "unicode/utypes.h"
#include "unicode/parseerr.h"

Go to the source code of this file.

Defines

#define UIDNA_DEFAULT   0x0000
 Option to prohibit processing of unassigned codepoints in the input and do not check if the input conforms to STD-3 ASCII rules. More...

#define UIDNA_ALLOW_UNASSIGNED   0x0001
 Option to allow processing of unassigned codepoints in the input. More...

#define UIDNA_USE_STD3_RULES   0x0002
 Option to check if input conforms to STD-3 ASCII rules. More...


Functions

int32_t uidna_toASCII (const UChar *src, int32_t srcLength, UChar *dest, int32_t destCapacity, int32_t options, UParseError *parseError, UErrorCode *status)
 This function implements the ToASCII operation as defined in the IDNA RFC. More...

int32_t uidna_toUnicode (const UChar *src, int32_t srcLength, UChar *dest, int32_t destCapacity, int32_t options, UParseError *parseError, UErrorCode *status)
 This function implements the ToUnicode operation as defined in the IDNA RFC. More...

int32_t uidna_IDNToASCII (const UChar *src, int32_t srcLength, UChar *dest, int32_t destCapacity, int32_t options, UParseError *parseError, UErrorCode *status)
 Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC. More...

int32_t uidna_IDNToUnicode (const UChar *src, int32_t srcLength, UChar *dest, int32_t destCapacity, int32_t options, UParseError *parseError, UErrorCode *status)
 Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC. More...

int32_t uidna_compare (const UChar *s1, int32_t length1, const UChar *s2, int32_t length2, int32_t options, UErrorCode *status)
 Compare two IDN strings for equivalence. More...


Detailed Description

UIDNA API implements the IDNA protocol as defined in the IDNA RFC (http://www.ietf.org/rfc/rfc3490.txt).

The RFC defines 2 operations: ToASCII and ToUnicode. Domain labels containing non-ASCII code points are required to be processed by ToASCII operation before passing it to resolver libraries. Domain names that are obtained from resolver libraries are required to be processed by ToUnicode operation before displaying the domain name to the user. IDNA requires that implementations process input strings with Nameprep (http://www.ietf.org/rfc/rfc3491.txt), which is a profile of Stringprep (http://www.ietf.org/rfc/rfc3454.txt), and then with Punycode (http://www.ietf.org/rfc/rfc3492.txt). Implementations of IDNA MUST fully implement Nameprep and Punycode; neither Nameprep nor Punycode are optional. The input and output of ToASCII and ToUnicode operations are Unicode and are designed to be chainable, i.e., applying ToASCII or ToUnicode operations multiple times to an input string will yield the same result as applying the operation once. ToUnicode(ToUnicode(ToUnicode...(ToUnicode(string)))) == ToUnicode(string) ToASCII(ToASCII(ToASCII...(ToASCII(string))) == ToASCII(string).

Definition in file uidna.h.


Define Documentation

#define UIDNA_ALLOW_UNASSIGNED   0x0001
 

Option to allow processing of unassigned codepoints in the input.

See also:
uidna_toASCII uidna_toUnicode
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

Definition at line 64 of file uidna.h.

#define UIDNA_DEFAULT   0x0000
 

Option to prohibit processing of unassigned codepoints in the input and do not check if the input conforms to STD-3 ASCII rules.

See also:
uidna_toASCII uidna_toUnicode
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

Definition at line 57 of file uidna.h.

#define UIDNA_USE_STD3_RULES   0x0002
 

Option to check if input conforms to STD-3 ASCII rules.

See also:
uidna_toASCII uidna_toUnicode
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

Definition at line 71 of file uidna.h.


Function Documentation

int32_t uidna_IDNToASCII const UChar *    src,
int32_t    srcLength,
UChar *    dest,
int32_t    destCapacity,
int32_t    options,
UParseError   parseError,
UErrorCode   status
 

Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.

This operation is done on complete domain names, e.g: "www.example.com". It is important to note that this operation can fail. If it fails, then the input domain name cannot be used as an Internationalized Domain Name and the application should have methods defined to deal with the failure.

Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters:
src  Input UChar array containing IDN in Unicode.
srcLength  Number of UChars in src, or -1 if NUL-terminated.
dest  Output UChar array with ASCII (ACE encoded) IDN.
destCapacity  Size of dest.
options  A bit set of options:

  • UIDNA_DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with U_UNASSIGNED_CODE_POINT_FOUND error code.
  • UIDNA_ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points.
  • UIDNA_USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with U_IDNA_STD3_ASCII_RULES_ERROR
Parameters:
parseError  Pointer to UParseError struct to receive information on position of error if an error is encountered. Can be NULL.
status  ICU in/out error code parameter. U_INVALID_CHAR_FOUND if src contains unmatched single surrogates. U_INDEX_OUTOFBOUNDS_ERROR if src contains too many code points. U_BUFFER_OVERFLOW_ERROR if destCapacity is not enough
Returns:
Number of ASCII characters converted.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

int32_t uidna_IDNToUnicode const UChar *    src,
int32_t    srcLength,
UChar *    dest,
int32_t    destCapacity,
int32_t    options,
UParseError   parseError,
UErrorCode   status
 

Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.

This operation is done on complete domain names, e.g: "www.example.com".

Note: IDNA RFC specifies that a conformant application should divide a domain name into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each, and then convert. This function does not offer that level of granularity. The options once set will apply to all labels in the domain name

Parameters:
src  Input UChar array containing IDN in ASCII (ACE encoded) form.
srcLength  Number of UChars in src, or -1 if NUL-terminated.
dest  Output UChar array containing Unicode equivalent of source IDN.
destCapacity  Size of dest.
options  A bit set of options:

  • UIDNA_DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with U_UNASSIGNED_CODE_POINT_FOUND error code.
  • UIDNA_ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points.
  • UIDNA_USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with U_IDNA_STD3_ASCII_RULES_ERROR
Parameters:
parseError  Pointer to UParseError struct to receive information on position of error if an error is encountered. Can be NULL.
status  ICU in/out error code parameter. U_INVALID_CHAR_FOUND if src contains unmatched single surrogates. U_INDEX_OUTOFBOUNDS_ERROR if src contains too many code points. U_BUFFER_OVERFLOW_ERROR if destCapacity is not enough
Returns:
Number of ASCII characters converted.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

int32_t uidna_compare const UChar *    s1,
int32_t    length1,
const UChar *    s2,
int32_t    length2,
int32_t    options,
UErrorCode   status
 

Compare two IDN strings for equivalence.

This function splits the domain names into labels and compares them. According to IDN RFC, whenever two labels are compared, they are considered equal if and only if their ASCII forms (obtained by applying toASCII) match using an case-insensitive ASCII comparison. Two domain names are considered a match if and only if all labels match regardless of whether label separators match.

Parameters:
s1  First source string.
length1  Length of first source string, or -1 if NUL-terminated.
s2  Second source string.
length2  Length of second source string, or -1 if NUL-terminated.
options  A bit set of options:

  • UIDNA_DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with U_UNASSIGNED_CODE_POINT_FOUND error code.
  • UIDNA_ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points.
  • UIDNA_USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with U_IDNA_STD3_ASCII_RULES_ERROR
Parameters:
status  ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
<0 or 0 or >0 as usual for string comparisons
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

int32_t uidna_toASCII const UChar *    src,
int32_t    srcLength,
UChar *    dest,
int32_t    destCapacity,
int32_t    options,
UParseError   parseError,
UErrorCode   status
 

This function implements the ToASCII operation as defined in the IDNA RFC.

This operation is done on single labels before sending it to something that expects ASCII names. A label is an individual part of a domain name. Labels are usually separated by dots; e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters:
src  Input UChar array containing label in Unicode.
srcLength  Number of UChars in src, or -1 if NUL-terminated.
dest  Output UChar array with ASCII (ACE encoded) label.
destCapacity  Size of dest.
options  A bit set of options:

  • UIDNA_DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with U_UNASSIGNED_ERROR error code.
  • UIDNA_ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points.
  • UIDNA_USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with U_IDNA_STD3_ASCII_RULES_ERROR
Parameters:
parseError  Pointer to UParseError struct to receive information on position of error if an error is encountered. Can be NULL.
status  ICU in/out error code parameter. U_INVALID_CHAR_FOUND if src contains unmatched single surrogates. U_INDEX_OUTOFBOUNDS_ERROR if src contains too many code points. U_BUFFER_OVERFLOW_ERROR if destCapacity is not enough
Returns:
Number of ASCII characters converted.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

int32_t uidna_toUnicode const UChar *    src,
int32_t    srcLength,
UChar *    dest,
int32_t    destCapacity,
int32_t    options,
UParseError   parseError,
UErrorCode   status
 

This function implements the ToUnicode operation as defined in the IDNA RFC.

This operation is done on single labels before sending it to something that expects Unicode names. A label is an individual part of a domain name. Labels are usually separated by dots; for e.g." "www.example.com" is composed of 3 labels "www","example", and "com".

Parameters:
src  Input UChar array containing ASCII (ACE encoded) label.
srcLength  Number of UChars in src, or -1 if NUL-terminated.
dest  Output Converted UChar array containing Unicode equivalent of label.
destCapacity  Size of dest.
options  A bit set of options:

  • UIDNA_DEFAULT Use default options, i.e., do not process unassigned code points and do not use STD3 ASCII rules If unassigned code points are found the operation fails with U_UNASSIGNED_ERROR error code.
  • UIDNA_ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations If this option is set, the unassigned code points are in the input are treated as normal Unicode code points. Note: This option is required on toUnicode operation because the RFC mandates verification of decoded ACE input by applying toASCII and comparing its output with source
  • UIDNA_USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions If this option is set and the input does not satisfy STD3 rules, the operation will fail with U_IDNA_STD3_ASCII_RULES_ERROR
Parameters:
parseError  Pointer to UParseError struct to receive information on position of error if an error is encountered. Can be NULL.
status  ICU in/out error code parameter. U_INVALID_CHAR_FOUND if src contains unmatched single surrogates. U_INDEX_OUTOFBOUNDS_ERROR if src contains too many code points. U_BUFFER_OVERFLOW_ERROR if destCapacity is not enough
Returns:
Number of Unicode characters converted.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.


Generated on Mon Nov 24 14:35:59 2003 for ICU 2.8 by doxygen1.2.11.1 written by Dimitri van Heesch, © 1997-2001