com.ibm.icu.text

Class Collator

public abstract class Collator extends Object implements Comparator, Cloneable

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the users guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }

 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(NO_DECOMPOSITION);
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     }
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
     }
 }
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
 }
 

Author: Syn Wee Quek

See Also: RuleBasedCollator CollationKey

UNKNOWN: ICU 2.8

Nested Class Summary
abstract static classCollator.CollatorFactory
A factory used with registerFactory to register multiple collators and provide display names for them.
Field Summary
static intCANONICAL_DECOMPOSITION

Decomposition mode value.

static intFULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only.
static intIDENTICAL

Smallest Collator strength value.

static intNO_DECOMPOSITION

Decomposition mode value.

static intPRIMARY
Strongest collator strength value.
static intQUATERNARY
Fourth level collator strength value.
static intSECONDARY
Second level collator strength value.
static intTERTIARY
Third level collator strength value.
Constructor Summary
protected Collator()
Empty default constructor to make javadocs happy
Method Summary
Objectclone()
Clone the collator.
intcompare(Object source, Object target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.

abstract intcompare(String source, String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.

booleanequals(String source, String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
static Locale[]getAvailableLocales()
Get the set of locales, as Locale objects, for which collators are installed.
static ULocale[]getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators are installed.
abstract CollationKeygetCollationKey(String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison.

intgetDecomposition()

Get the decomposition mode of this Collator.

static StringgetDisplayName(Locale objectLocale, Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
static StringgetDisplayName(ULocale objectLocale, ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
static StringgetDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
static StringgetDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
static ULocalegetFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static ULocalegetFunctionalEquivalent(String keyword, ULocale locID)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static CollatorgetInstance()
Gets the Collator for the current default locale.
static CollatorgetInstance(ULocale locale)
Gets the Collator for the desired locale.
static CollatorgetInstance(Locale locale)
Gets the Collator for the desired locale.
static String[]getKeywords()
Return an array of all possible keywords that are relevant to collation.
static String[]getKeywordValues(String keyword)
Given a keyword, return an array of all values for that keyword that are currently in use.
ULocalegetLocale(ULocale.Type type)
Return the locale that was used to create this object, or null.
abstract RawCollationKeygetRawCollationKey(String source, RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key.
intgetStrength()

Returns this Collator's strength property.

UnicodeSetgetTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.
abstract VersionInfogetUCAVersion()
Get the UCA version of this collator object.
abstract intgetVariableTop()
Gets the variable top value of a Collator.
abstract VersionInfogetVersion()
Get the version of this collator object.
static ObjectregisterFactory(Collator.CollatorFactory factory)
Register a collator factory.
static ObjectregisterInstance(Collator collator, ULocale locale)
Register a collator as the default collator for the provided locale.
voidsetDecomposition(int decomposition)

Set the decomposition mode of this Collator.

voidsetStrength(int newStrength)

Sets this Collator's strength property.

abstract intsetVariableTop(String varTop)

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

abstract voidsetVariableTop(int varTop)
Sets the variable top to a collation element value supplied.
static booleanunregister(Object registryKey)
Unregister a collator previously registered using registerInstance.

Field Detail

CANONICAL_DECOMPOSITION

public static final int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See Also: NO_DECOMPOSITION Collator Collator

UNKNOWN: ICU 2.8

FULL_DECOMPOSITION

public static final int FULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.

UNKNOWN: ICU 3.4 This API might change or be removed in a future release.

IDENTICAL

public static final int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

UNKNOWN: ICU 2.8

NO_DECOMPOSITION

public static final int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See Also: CANONICAL_DECOMPOSITION Collator Collator

UNKNOWN: ICU 2.8

PRIMARY

public static final int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.

See Also: Collator Collator

UNKNOWN: ICU 2.8

QUATERNARY

public static final int QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.

See Also: Collator Collator

UNKNOWN: ICU 2.8

SECONDARY

public static final int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.

See Also: Collator Collator

UNKNOWN: ICU 2.8

TERTIARY

public static final int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

See Also: Collator Collator

UNKNOWN: ICU 2.8

Constructor Detail

Collator

protected Collator()
Empty default constructor to make javadocs happy

UNKNOWN: ICU 2.4

Method Detail

clone

public Object clone()
Clone the collator.

Returns: a clone of this collator.

UNKNOWN: ICU 2.6

compare

public int compare(Object source, Object target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Parameters: source the source String. target the target String.

Returns: Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

Throws: NullPointerException thrown if either arguments is null. IllegalArgumentException thrown if either source or target is not of the class String.

See Also: CollationKey Collator

UNKNOWN: ICU 2.8

compare

public abstract int compare(String source, String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Parameters: source the source String. target the target String.

Returns: Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

Throws: NullPointerException thrown if either arguments is null.

See Also: CollationKey Collator

UNKNOWN: ICU 2.8

equals

public boolean equals(String source, String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.

Parameters: source the source string to be compared. target the target string to be compared.

Returns: true if the strings are equal according to the collation rules, otherwise false.

Throws: NullPointerException thrown if either arguments is null.

See Also: Collator

UNKNOWN: ICU 2.8

getAvailableLocales

public static Locale[] getAvailableLocales()
Get the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.

Returns: the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

UNKNOWN: ICU 2.4

getAvailableULocales

public static final ULocale[] getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.

Returns: the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

UNKNOWN: ICU 3.0

getCollationKey

public abstract CollationKey getCollationKey(String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

See the CollationKey class documentation for more information.

Parameters: source the string to be transformed into a CollationKey.

Returns: the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.

See Also: CollationKey Collator Collator

UNKNOWN: ICU 2.8

getDecomposition

public int getDecomposition()

Get the decomposition mode of this Collator. Decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

Returns: the decomposition mode

See Also: Collator NO_DECOMPOSITION CANONICAL_DECOMPOSITION

UNKNOWN: ICU 2.8

getDisplayName

public static String getDisplayName(Locale objectLocale, Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.

Parameters: objectLocale the locale of the collator displayLocale the locale for the collator's display name

Returns: the display name

UNKNOWN: ICU 2.6

getDisplayName

public static String getDisplayName(ULocale objectLocale, ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.

Parameters: objectLocale the locale of the collator displayLocale the locale for the collator's display name

Returns: the display name

UNKNOWN: ICU 3.2 This API might change or be removed in a future release.

getDisplayName

public static String getDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.

Parameters: objectLocale the locale of the collator

Returns: the display name

UNKNOWN: ICU 2.6

getDisplayName

public static String getDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.

Parameters: objectLocale the locale of the collator

Returns: the display name

UNKNOWN: ICU 3.2 This API might change or be removed in a future release.

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.

Parameters: keyword a particular keyword as enumerated by getKeywords. locID The requested locale isAvailable If non-null, isAvailable[0] will receive and output boolean that indicates whether the requested locale was 'available' to the collation service. The locale is defined as 'available' if it physically exists within the collation locale data. If non-null, isAvailable must have length >= 1.

Returns: the locale

UNKNOWN: ICU 3.0

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword, ULocale locID)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

Parameters: keyword a particular keyword as enumerated by getKeywords. locID The requested locale

Returns: the locale

See Also: (String,ULocale,boolean[])

UNKNOWN: ICU 3.0

getInstance

public static final Collator getInstance()
Gets the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().

Returns: the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the default UCA collator will be returned.

See Also: java.util.Locale#getDefault() getInstance

UNKNOWN: ICU 2.8

getInstance

public static final Collator getInstance(ULocale locale)
Gets the Collator for the desired locale.

Parameters: locale the desired locale.

Returns: Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.

See Also: java.util.Locale java.util.ResourceBundle getInstance getInstance

UNKNOWN: ICU 3.0

getInstance

public static final Collator getInstance(Locale locale)
Gets the Collator for the desired locale.

Parameters: locale the desired locale.

Returns: Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.

See Also: java.util.Locale java.util.ResourceBundle getInstance getInstance

UNKNOWN: ICU 2.8

getKeywords

public static final String[] getKeywords()
Return an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".

Returns: an array of valid collation keywords.

See Also: Collator

UNKNOWN: ICU 3.0

getKeywordValues

public static final String[] getKeywordValues(String keyword)
Given a keyword, return an array of all values for that keyword that are currently in use.

Parameters: keyword one of the keywords returned by getKeywords.

See Also: Collator

UNKNOWN: ICU 3.0

getLocale

public final ULocale getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null. This may may differ from the locale requested at the time of this object's creation. For example, if an object is created for locale en_US_CALIFORNIA, the actual data may be drawn from en (the actual locale), and en_US may be the most specific locale that exists (the valid locale).

Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.

Parameters: type type of information requested, either {@link com.ibm.icu.util.ULocale#VALID_LOCALE} or {@link com.ibm.icu.util.ULocale#ACTUAL_LOCALE}.

Returns: the information specified by type, or null if this object was not constructed from locale data.

See Also: ULocale VALID_LOCALE ACTUAL_LOCALE

UNKNOWN: ICU 2.8 (retain) This API might change or be removed in a future release.

getRawCollationKey

public abstract RawCollationKey getRawCollationKey(String source, RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. If key has a internal byte array of length that's too small for the result, the internal byte array will be grown to the exact required size.

Parameters: source the text String to be transformed into a RawCollationKey

Returns: If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.

See Also: Collator Collator RawCollationKey

UNKNOWN: ICU 2.8

getStrength

public int getStrength()

Returns this Collator's strength property. The strength property determines the minimum level of difference considered significant.

See the Collator class description for more details.

Returns: this Collator's current strength property.

See Also: Collator PRIMARY SECONDARY TERTIARY QUATERNARY IDENTICAL

UNKNOWN: ICU 2.8

getTailoredSet

public UnicodeSet getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.

Returns: a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the UCA.

UNKNOWN: ICU 2.4

getUCAVersion

public abstract VersionInfo getUCAVersion()
Get the UCA version of this collator object.

Returns: the version object associated with this collator

UNKNOWN: ICU 2.8

getVariableTop

public abstract int getVariableTop()
Gets the variable top value of a Collator. Lower 16 bits are undefined and should be ignored.

Returns: the variable top value of a Collator.

See Also: Collator

UNKNOWN: ICU 2.6

getVersion

public abstract VersionInfo getVersion()
Get the version of this collator object.

Returns: the version object associated with this collator

UNKNOWN: ICU 2.8

registerFactory

public static final Object registerFactory(Collator.CollatorFactory factory)
Register a collator factory.

Parameters: factory the factory to register

Returns: an object that can be used to unregister the registered factory.

UNKNOWN: ICU 2.6

registerInstance

public static final Object registerInstance(Collator collator, ULocale locale)
Register a collator as the default collator for the provided locale. The collator should not be modified after it is registered.

Parameters: collator the collator to register locale the locale for which this is the default collator

Returns: an object that can be used to unregister the registered collator.

UNKNOWN: ICU 3.2 This API might change or be removed in a future release.

setDecomposition

public void setDecomposition(int decomposition)

Set the decomposition mode of this Collator. Setting this decomposition property with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.

The default decompositon mode for the Collator is NO_DECOMPOSITON, unless specified otherwise by the locale used to create the Collator.

See getDecomposition for a description of decomposition mode.

Parameters: decomposition the new decomposition mode

Throws: IllegalArgumentException If the given value is not a valid decomposition mode.

See Also: Collator NO_DECOMPOSITION CANONICAL_DECOMPOSITION

UNKNOWN: ICU 2.8

setStrength

public void setStrength(int newStrength)

Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

The default strength for the Collator is TERTIARY, unless specified otherwise by the locale used to create the Collator.

See the Collator class description for an example of use.

Parameters: newStrength the new strength value.

Throws: IllegalArgumentException if the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.

See Also: Collator PRIMARY SECONDARY TERTIARY QUATERNARY IDENTICAL

UNKNOWN: ICU 2.8

setVariableTop

public abstract int setVariableTop(String varTop)

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.

Parameters: varTop one or more (if contraction) characters to which the variable top should be set

Returns: a int value containing the value of the variable top in upper 16 bits. Lower 16 bits are undefined.

Throws: IllegalArgumentException is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when it is a contraction that does not exist in the Collation order or when the PRIMARY strength collation element for the variable top has more than two bytes

See Also: Collator RuleBasedCollator

UNKNOWN: ICU 2.6

setVariableTop

public abstract void setVariableTop(int varTop)
Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits. Lower 16 bits are ignored.

Parameters: varTop Collation element value, as returned by setVariableTop or getVariableTop

See Also: Collator Collator

UNKNOWN: ICU 2.6

unregister

public static final boolean unregister(Object registryKey)
Unregister a collator previously registered using registerInstance.

Parameters: registryKey the object previously returned by registerInstance.

Returns: true if the collator was successfully unregistered.

UNKNOWN: ICU 2.6

Copyright (c) 2007 IBM Corporation and others.