com.ibm.icu.text
public abstract class Collator extends Object implements Comparator, Cloneable
Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.
Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:
For more information about the collation service see the users guide.
Examples of use
// Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if (usCollator.compare("abc", "ABC") == 0) { System.out.println("Strings are equivalent"); } The following example shows how to compare two strings using the Collator for the default locale. // Compare two strings in the default locale Collator myCollator = Collator.getInstance(); myCollator.setDecomposition(NO_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition"); myCollator.setDecomposition(CANONICAL_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition"); } else { System.out.println("à\u0325 is equals to a\u0325̀ with decomposition"); } } else { System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition"); }
See Also: RuleBasedCollator CollationKey
UNKNOWN: ICU 2.8
Nested Class Summary | |
---|---|
abstract static class | Collator.CollatorFactory
A factory used with registerFactory to register multiple collators and provide
display names for them. |
Field Summary | |
---|---|
static int | CANONICAL_DECOMPOSITION Decomposition mode value. |
static int | FULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only. |
static int | IDENTICAL Smallest Collator strength value. |
static int | NO_DECOMPOSITION Decomposition mode value. |
static int | PRIMARY
Strongest collator strength value. |
static int | QUATERNARY
Fourth level collator strength value.
|
static int | SECONDARY
Second level collator strength value.
|
static int | TERTIARY
Third level collator strength value.
|
Constructor Summary | |
---|---|
protected | Collator()
Empty default constructor to make javadocs happy |
Method Summary | |
---|---|
Object | clone()
Clone the collator. |
int | compare(Object source, Object target) Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. |
abstract int | compare(String source, String target) Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. |
boolean | equals(String source, String target)
Convenience method for comparing the equality of two text Strings using
this Collator's rules, strength and decomposition mode. |
static Locale[] | getAvailableLocales()
Get the set of locales, as Locale objects, for which collators
are installed. |
static ULocale[] | getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators
are installed. |
abstract CollationKey | getCollationKey(String source) Transforms the String into a CollationKey suitable for efficient repeated comparison. |
int | getDecomposition() Get the decomposition mode of this Collator. |
static String | getDisplayName(Locale objectLocale, Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale. |
static String | getDisplayName(ULocale objectLocale, ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale. |
static String | getDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale. |
static String | getDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale. |
static ULocale | getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service. |
static ULocale | getFunctionalEquivalent(String keyword, ULocale locID)
Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service. |
static Collator | getInstance()
Gets the Collator for the current default locale.
|
static Collator | getInstance(ULocale locale)
Gets the Collator for the desired locale. |
static Collator | getInstance(Locale locale)
Gets the Collator for the desired locale. |
static String[] | getKeywords()
Return an array of all possible keywords that are relevant to
collation. |
static String[] | getKeywordValues(String keyword)
Given a keyword, return an array of all values for
that keyword that are currently in use. |
ULocale | getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null.
|
abstract RawCollationKey | getRawCollationKey(String source, RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided
argument key.
|
int | getStrength() Returns this Collator's strength property. |
UnicodeSet | getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences
tailored in this collator. |
abstract VersionInfo | getUCAVersion()
Get the UCA version of this collator object. |
abstract int | getVariableTop()
Gets the variable top value of a Collator.
|
abstract VersionInfo | getVersion()
Get the version of this collator object. |
static Object | registerFactory(Collator.CollatorFactory factory)
Register a collator factory.
|
static Object | registerInstance(Collator collator, ULocale locale)
Register a collator as the default collator for the provided locale. |
void | setDecomposition(int decomposition) Set the decomposition mode of this Collator. |
void | setStrength(int newStrength) Sets this Collator's strength property. |
abstract int | setVariableTop(String varTop) Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED. |
abstract void | setVariableTop(int varTop)
Sets the variable top to a collation element value supplied.
|
static boolean | unregister(Object registryKey)
Unregister a collator previously registered using registerInstance. |
Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.
CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.
See Also: NO_DECOMPOSITION Collator Collator
UNKNOWN: ICU 2.8
UNKNOWN: ICU 3.4 This API might change or be removed in a future release.
Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.
Note this value is different from JDK's
UNKNOWN: ICU 2.8
Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.
Note this value is different from the JDK's.
See Also: CANONICAL_DECOMPOSITION Collator Collator
UNKNOWN: ICU 2.8
UNKNOWN: ICU 2.8
UNKNOWN: ICU 2.8
UNKNOWN: ICU 2.8
UNKNOWN: ICU 2.8
UNKNOWN: ICU 2.4
Returns: a clone of this collator.
UNKNOWN: ICU 2.6
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.
Parameters: source the source String. target the target String.
Returns: Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws: NullPointerException thrown if either arguments is null. IllegalArgumentException thrown if either source or target is not of the class String.
See Also: CollationKey Collator
UNKNOWN: ICU 2.8
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.
Parameters: source the source String. target the target String.
Returns: Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws: NullPointerException thrown if either arguments is null.
See Also: CollationKey Collator
UNKNOWN: ICU 2.8
Parameters: source the source string to be compared. target the target string to be compared.
Returns: true if the strings are equal according to the collation rules, otherwise false.
Throws: NullPointerException thrown if either arguments is null.
See Also: Collator
UNKNOWN: ICU 2.8
Returns: the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.
UNKNOWN: ICU 2.4
Returns: the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.
UNKNOWN: ICU 3.0
Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.
See the CollationKey class documentation for more information.
Parameters: source the string to be transformed into a CollationKey.
Returns: the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.
See Also: CollationKey Collator Collator
UNKNOWN: ICU 2.8
Get the decomposition mode of this Collator. Decomposition mode determines how Unicode composed characters are handled.
See the Collator class description for more details.
Returns: the decomposition mode
See Also: Collator NO_DECOMPOSITION CANONICAL_DECOMPOSITION
UNKNOWN: ICU 2.8
Parameters: objectLocale the locale of the collator displayLocale the locale for the collator's display name
Returns: the display name
UNKNOWN: ICU 2.6
Parameters: objectLocale the locale of the collator displayLocale the locale for the collator's display name
Returns: the display name
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Parameters: objectLocale the locale of the collator
Returns: the display name
UNKNOWN: ICU 2.6
Parameters: objectLocale the locale of the collator
Returns: the display name
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Parameters: keyword a particular keyword as enumerated by getKeywords. locID The requested locale isAvailable If non-null, isAvailable[0] will receive and output boolean that indicates whether the requested locale was 'available' to the collation service. The locale is defined as 'available' if it physically exists within the collation locale data. If non-null, isAvailable must have length >= 1.
Returns: the locale
UNKNOWN: ICU 3.0
Parameters: keyword a particular keyword as enumerated by getKeywords. locID The requested locale
Returns: the locale
See Also: (String,ULocale,boolean[])
UNKNOWN: ICU 3.0
Returns: the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the default UCA collator will be returned.
See Also: java.util.Locale#getDefault() getInstance
UNKNOWN: ICU 2.8
Parameters: locale the desired locale.
Returns: Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.
See Also: java.util.Locale java.util.ResourceBundle getInstance getInstance
UNKNOWN: ICU 3.0
Parameters: locale the desired locale.
Returns: Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.
See Also: java.util.Locale java.util.ResourceBundle getInstance getInstance
UNKNOWN: ICU 2.8
Returns: an array of valid collation keywords.
See Also: Collator
UNKNOWN: ICU 3.0
Parameters: keyword one of the keywords returned by getKeywords.
See Also: Collator
UNKNOWN: ICU 3.0
Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.
Parameters: type type of information requested, either {@link com.ibm.icu.util.ULocale#VALID_LOCALE} or {@link com.ibm.icu.util.ULocale#ACTUAL_LOCALE}.
Returns: the information specified by type, or null if this object was not constructed from locale data.
See Also: ULocale VALID_LOCALE ACTUAL_LOCALE
UNKNOWN: ICU 2.8 (retain) This API might change or be removed in a future release.
Parameters: source the text String to be transformed into a RawCollationKey
Returns: If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.
See Also: Collator Collator RawCollationKey
UNKNOWN: ICU 2.8
Returns this Collator's strength property. The strength property determines the minimum level of difference considered significant.
See the Collator class description for more details.
Returns: this Collator's current strength property.
See Also: Collator PRIMARY SECONDARY TERTIARY QUATERNARY IDENTICAL
UNKNOWN: ICU 2.8
Returns: a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the UCA.
UNKNOWN: ICU 2.4
Returns: the version object associated with this collator
UNKNOWN: ICU 2.8
Returns: the variable top value of a Collator.
See Also: Collator
UNKNOWN: ICU 2.6
Returns: the version object associated with this collator
UNKNOWN: ICU 2.8
Parameters: factory the factory to register
Returns: an object that can be used to unregister the registered factory.
UNKNOWN: ICU 2.6
Parameters: collator the collator to register locale the locale for which this is the default collator
Returns: an object that can be used to unregister the registered collator.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Set the decomposition mode of this Collator. Setting this decomposition property with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.
Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.
The default decompositon mode for the Collator is NO_DECOMPOSITON, unless specified otherwise by the locale used to create the Collator.See getDecomposition for a description of decomposition mode.
Parameters: decomposition the new decomposition mode
Throws: IllegalArgumentException If the given value is not a valid decomposition mode.
See Also: Collator NO_DECOMPOSITION CANONICAL_DECOMPOSITION
UNKNOWN: ICU 2.8
Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.
The default strength for the Collator is TERTIARY, unless specified otherwise by the locale used to create the Collator.
See the Collator class description for an example of use.
Parameters: newStrength the new strength value.
Throws: IllegalArgumentException if the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
See Also: Collator PRIMARY SECONDARY TERTIARY QUATERNARY IDENTICAL
UNKNOWN: ICU 2.8
Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
Parameters: varTop one or more (if contraction) characters to which the variable top should be set
Returns: a int value containing the value of the variable top in upper 16 bits. Lower 16 bits are undefined.
Throws: IllegalArgumentException is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when it is a contraction that does not exist in the Collation order or when the PRIMARY strength collation element for the variable top has more than two bytes
See Also: Collator RuleBasedCollator
UNKNOWN: ICU 2.6
Parameters: varTop Collation element value, as returned by setVariableTop or getVariableTop
UNKNOWN: ICU 2.6
Parameters: registryKey the object previously returned by registerInstance.
Returns: true if the collator was successfully unregistered.
UNKNOWN: ICU 2.6