Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members   Search  

uset.h File Reference

C API: Unicode Set. More...

#include "unicode/utypes.h"
#include "unicode/umisc.h"

Go to the source code of this file.

Compounds

struct  USerializedSet
 A serialized form of a Unicode set. More...


Typedefs

typedef USet USet
 A UnicodeSet. More...

typedef USerializedSet USerializedSet
 A serialized form of a Unicode set. More...


Enumerations

enum  { USET_IGNORE_SPACE = 1, USET_CASE_INSENSITIVE = 2, USET_CASE = 2, USET_SERIALIZED_STATIC_ARRAY_CAPACITY = 8 }
 Bitmask values to be passed to uset_openPatternOptions() or uset_applyPattern() taking an option parameter. More...


Functions

USetuset_open (UChar32 start, UChar32 end)
 Creates a USet object that contains the range of characters start..end, inclusive. More...

USetuset_openPattern (const UChar *pattern, int32_t patternLength, UErrorCode *ec)
 Creates a set from the given pattern. More...

USetuset_openPatternOptions (const UChar *pattern, int32_t patternLength, uint32_t options, UErrorCode *ec)
 Creates a set from the given pattern. More...

void uset_close (USet *set)
 Disposes of the storage used by a USet object. More...

int32_t uset_applyPattern (USet *set, const UChar *pattern, int32_t patternLength, uint32_t options, UErrorCode *status)
 Modifies the set to represent the set specified by the given pattern. More...

int32_t uset_toPattern (const USet *set, UChar *result, int32_t resultCapacity, UBool escapeUnprintable, UErrorCode *ec)
 Returns a string representation of this set. More...

void uset_add (USet *set, UChar32 c)
 Adds the given character to the given USet. More...

void uset_addAll (USet *set, const USet *additionalSet)
 Adds all of the elements in the specified set to this set if they're not already present. More...

void uset_addRange (USet *set, UChar32 start, UChar32 end)
 Adds the given range of characters to the given USet. More...

void uset_addString (USet *set, const UChar *str, int32_t strLen)
 Adds the given string to the given USet. More...

void uset_remove (USet *set, UChar32 c)
 Removes the given character from the given USet. More...

void uset_removeRange (USet *set, UChar32 start, UChar32 end)
 Removes the given range of characters from the given USet. More...

void uset_removeString (USet *set, const UChar *str, int32_t strLen)
 Removes the given string to the given USet. More...

void uset_complement (USet *set)
 Inverts this set. More...

void uset_clear (USet *set)
 Removes all of the elements from this set. More...

UBool uset_isEmpty (const USet *set)
 Returns TRUE if the given USet contains no characters and no strings. More...

UBool uset_contains (const USet *set, UChar32 c)
 Returns TRUE if the given USet contains the given character. More...

UBool uset_containsRange (const USet *set, UChar32 start, UChar32 end)
 Returns TRUE if the given USet contains all characters c where start <= c && c <= end. More...

UBool uset_containsString (const USet *set, const UChar *str, int32_t strLen)
 Returns TRUE if the given USet contains the given string. More...

int32_t uset_size (const USet *set)
 Returns the number of characters and strings contained in the given USet. More...

int32_t uset_getItemCount (const USet *set)
 Returns the number of items in this set. More...

int32_t uset_getItem (const USet *set, int32_t itemIndex, UChar32 *start, UChar32 *end, UChar *str, int32_t strCapacity, UErrorCode *ec)
 Returns an item of this set. More...

int32_t uset_serialize (const USet *set, uint16_t *dest, int32_t destCapacity, UErrorCode *pErrorCode)
 Serializes this set into an array of 16-bit integers. More...

UBool uset_getSerializedSet (USerializedSet *fillSet, const uint16_t *src, int32_t srcLength)
 Given a serialized array, fill in the given serialized set object. More...

void uset_setSerializedToOne (USerializedSet *fillSet, UChar32 c)
 Set the USerializedSet to contain the given character (and nothing else). More...

UBool uset_serializedContains (const USerializedSet *set, UChar32 c)
 Returns TRUE if the given USerializedSet contains the given character. More...

int32_t uset_getSerializedRangeCount (const USerializedSet *set)
 Returns the number of disjoint ranges of characters contained in the given serialized set. More...

UBool uset_getSerializedRange (const USerializedSet *set, int32_t rangeIndex, UChar32 *pStart, UChar32 *pEnd)
 Returns a range of characters contained in the given serialized set. More...


Detailed Description

C API: Unicode Set.

This is a C wrapper around the C++ UnicodeSet class.

Definition in file uset.h.


Typedef Documentation

typedef struct USerializedSet USerializedSet
 

A serialized form of a Unicode set.

Limited manipulations are possible directly on a serialized set. See below.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

typedef struct USet USet
 

A UnicodeSet.

Use the uset_* API to manipulate. Create with uset_open*, and destroy with uset_close.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

Definition at line 40 of file uset.h.


Enumeration Type Documentation

anonymous enum
 

Bitmask values to be passed to uset_openPatternOptions() or uset_applyPattern() taking an option parameter.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.
Enumeration values:
USET_IGNORE_SPACE  Ignore white space within patterns unless quoted or escaped.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.
USET_CASE_INSENSITIVE  Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.
USET_CASE  Bitmask for UnicodeSet::closeOver() indicating letter case.

This may be ORed together with other selectors.

Internal:
For internal use only.
USET_SERIALIZED_STATIC_ARRAY_CAPACITY  Enough for any single-code point set.

Internal:
For internal use only.

Definition at line 48 of file uset.h.


Function Documentation

void uset_add USet   set,
UChar32    c
 

Adds the given character to the given USet.

After this call, uset_contains(set, c) will return TRUE.

Parameters:
set  the object to which to add the character
c  the character to add
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_addAll USet   set,
const USet   additionalSet
 

Adds all of the elements in the specified set to this set if they're not already present.

This operation effectively modifies this set so that its value is the union of the two sets. The behavior of this operation is unspecified if the specified collection is modified while the operation is in progress.

Parameters:
set  the object to which to add the set
additionalSet  the source set whose elements are to be added to this set.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

void uset_addRange USet   set,
UChar32    start,
UChar32    end
 

Adds the given range of characters to the given USet.

After this call, uset_contains(set, start, end) will return TRUE.

Parameters:
set  the object to which to add the character
start  the first character of the range to add, inclusive
end  the last character of the range to add, inclusive
Stable:
ICU 2.2

void uset_addString USet   set,
const UChar *    str,
int32_t    strLen
 

Adds the given string to the given USet.

After this call, uset_containsString(set, str, strLen) will return TRUE.

Parameters:
set  the object to which to add the character
str  the string to add
strLen  the length of the string or -1 if null terminated.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_applyPattern USet   set,
const UChar *    pattern,
int32_t    patternLength,
uint32_t    options,
UErrorCode   status
 

Modifies the set to represent the set specified by the given pattern.

See the UnicodeSet class description for the syntax of the pattern language. See also the User Guide chapter about UnicodeSet. Empties the set passed before applying the pattern.

Parameters:
set  The set to which the pattern is to be applied.
pattern  A pointer to UChar string specifying what characters are in the set. The character at pattern[0] must be a '['.
patternLength  The length of the UChar string. -1 if NUL terminated.
options  A bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
status  Returns an error if the pattern cannot be parsed.
Returns:
Upon successful parse, the value is either the index of the character after the closing ']' of the parsed pattern. If the status code indicates failure, then the return value is the index of the error in the source.
Draft:
This API has been introduced in ICU 2.8. It is still in draft state and may be modified in a future release.

void uset_clear USet   set
 

Removes all of the elements from this set.

This set will be empty after this call returns.

Parameters:
set  the set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_close USet   set
 

Disposes of the storage used by a USet object.

This function should be called exactly once for objects returned by uset_open().

Parameters:
set  the object to dispose of
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_complement USet   set
 

Inverts this set.

This operation modifies this set so that its value is its complement. This operation does not affect the multicharacter strings, if any.

Parameters:
set  the set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_contains const USet   set,
UChar32    c
 

Returns TRUE if the given USet contains the given character.

Parameters:
set  the set
c  The codepoint to check for within the set
Returns:
true if set contains c
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_containsRange const USet   set,
UChar32    start,
UChar32    end
 

Returns TRUE if the given USet contains all characters c where start <= c && c <= end.

Parameters:
set  the set
start  the first character of the range to test, inclusive
end  the last character of the range to test, inclusive
Returns:
TRUE if set contains the range
Stable:
ICU 2.2

UBool uset_containsString const USet   set,
const UChar *    str,
int32_t    strLen
 

Returns TRUE if the given USet contains the given string.

Parameters:
set  the set
str  the string
strLen  the length of the string or -1 if null terminated.
Returns:
true if set contains str
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_getItem const USet   set,
int32_t    itemIndex,
UChar32   start,
UChar32   end,
UChar *    str,
int32_t    strCapacity,
UErrorCode   ec
 

Returns an item of this set.

An item is either a range of characters or a single multicharacter string.

Parameters:
set  the set
itemIndex  a non-negative integer in the range 0.. uset_getItemCount(set)-1
start  pointer to variable to receive first character in range, inclusive
end  pointer to variable to receive last character in range, inclusive
str  buffer to receive the string, may be NULL
strCapacity  capacity of str, or 0 if str is NULL
ec  error code
Returns:
the length of the string (>= 2), or 0 if the item is a range, in which case it is the range *start..*end, or -1 if itemIndex is out of range
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_getItemCount const USet   set
 

Returns the number of items in this set.

An item is either a range of characters or a single multicharacter string.

Parameters:
set  the set
Returns:
a non-negative integer counting the character ranges and/or strings contained in set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_getSerializedRange const USerializedSet   set,
int32_t    rangeIndex,
UChar32   pStart,
UChar32   pEnd
 

Returns a range of characters contained in the given serialized set.

Parameters:
set  the serialized set
rangeIndex  a non-negative integer in the range 0.. uset_getSerializedRangeCount(set)-1
pStart  pointer to variable to receive first character in range, inclusive
pEnd  pointer to variable to receive last character in range, inclusive
Returns:
true if rangeIndex is valid, otherwise false
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_getSerializedRangeCount const USerializedSet   set
 

Returns the number of disjoint ranges of characters contained in the given serialized set.

Ignores any strings contained in the set.

Parameters:
set  the serialized set
Returns:
a non-negative integer counting the character ranges contained in set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_getSerializedSet USerializedSet   fillSet,
const uint16_t *    src,
int32_t    srcLength
 

Given a serialized array, fill in the given serialized set object.

Parameters:
fillSet  pointer to result
src  pointer to start of array
srcLength  length of array
Returns:
true if the given array is valid, otherwise false
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_isEmpty const USet   set
 

Returns TRUE if the given USet contains no characters and no strings.

Parameters:
set  the set
Returns:
true if set is empty
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

USet* uset_open UChar32    start,
UChar32    end
 

Creates a USet object that contains the range of characters start..end, inclusive.

Parameters:
start  first character of the range, inclusive
end  last character of the range, inclusive
Returns:
a newly created USet. The caller must call uset_close() on it when done.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

USet* uset_openPattern const UChar *    pattern,
int32_t    patternLength,
UErrorCode   ec
 

Creates a set from the given pattern.

See the UnicodeSet class description for the syntax of the pattern language.

Parameters:
pattern  a string specifying what characters are in the set
patternLength  the length of the pattern, or -1 if null terminated
ec  the error code
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

USet* uset_openPatternOptions const UChar *    pattern,
int32_t    patternLength,
uint32_t    options,
UErrorCode   ec
 

Creates a set from the given pattern.

See the UnicodeSet class description for the syntax of the pattern language.

Parameters:
pattern  a string specifying what characters are in the set
patternLength  the length of the pattern, or -1 if null terminated
options  bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
ec  the error code
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_remove USet   set,
UChar32    c
 

Removes the given character from the given USet.

After this call, uset_contains(set, c) will return FALSE.

Parameters:
set  the object from which to remove the character
c  the character to remove
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_removeRange USet   set,
UChar32    start,
UChar32    end
 

Removes the given range of characters from the given USet.

After this call, uset_contains(set, start, end) will return FALSE.

Parameters:
set  the object to which to add the character
start  the first character of the range to remove, inclusive
end  the last character of the range to remove, inclusive
Stable:
ICU 2.2

void uset_removeString USet   set,
const UChar *    str,
int32_t    strLen
 

Removes the given string to the given USet.

After this call, uset_containsString(set, str, strLen) will return FALSE.

Parameters:
set  the object to which to add the character
str  the string to remove
strLen  the length of the string or -1 if null terminated.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_serialize const USet   set,
uint16_t *    dest,
int32_t    destCapacity,
UErrorCode   pErrorCode
 

Serializes this set into an array of 16-bit integers.

Serialization (currently) only records the characters in the set; multicharacter strings are ignored.

The array has following format (each line is one 16-bit integer):

length = (n+2*m) | (m!=0?0x8000:0) bmpLength = n; present if m!=0 bmp[0] bmp[1] ... bmp[n-1] supp-high[0] supp-low[0] supp-high[1] supp-low[1] ... supp-high[m-1] supp-low[m-1]

The array starts with a header. After the header are n bmp code points, then m supplementary code points. Either n or m or both may be zero. n+2*m is always <= 0x7FFF.

If there are no supplementary characters (if m==0) then the header is one 16-bit integer, 'length', with value n.

If there are supplementary characters (if m!=0) then the header is two 16-bit integers. The first, 'length', has value (n+2*m)|0x8000. The second, 'bmpLength', has value n.

After the header the code points are stored in ascending order. Supplementary code points are stored as most significant 16 bits followed by least significant 16 bits.

Parameters:
set  the set
dest  pointer to buffer of destCapacity 16-bit integers. May be NULL only if destCapacity is zero.
destCapacity  size of dest, or zero. Must not be negative.
pErrorCode  pointer to the error code. Will be set to U_INDEX_OUTOFBOUNDS_ERROR if n+2*m > 0x7FFF. Will be set to U_BUFFER_OVERFLOW_ERROR if n+2*m+(m!=0?2:1) > destCapacity.
Returns:
the total length of the serialized format, including the header, that is, n+2*m+(m!=0?2:1), or 0 on error other than U_BUFFER_OVERFLOW_ERROR.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool uset_serializedContains const USerializedSet   set,
UChar32    c
 

Returns TRUE if the given USerializedSet contains the given character.

Parameters:
set  the serialized set
c  The codepoint to check for within the set
Returns:
true if set contains c
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void uset_setSerializedToOne USerializedSet   fillSet,
UChar32    c
 

Set the USerializedSet to contain the given character (and nothing else).

Parameters:
fillSet  pointer to result
c  The codepoint to set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_size const USet   set
 

Returns the number of characters and strings contained in the given USet.

Parameters:
set  the set
Returns:
a non-negative integer counting the characters and strings contained in set
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

int32_t uset_toPattern const USet   set,
UChar *    result,
int32_t    resultCapacity,
UBool    escapeUnprintable,
UErrorCode   ec
 

Returns a string representation of this set.

If the result of calling this function is passed to a uset_openPattern(), it will produce another set that is equal to this one.

Parameters:
set  the set
result  the string to receive the rules, may be NULL
resultCapacity  the capacity of result, may be 0 if result is NULL
escapeUnprintable  if TRUE then convert unprintable character to their hex escape representations, \uxxxx or \Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
ec  error code.
Returns:
length of string, possibly larger than resultCapacity
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.


Generated on Mon Nov 24 14:36:06 2003 for ICU 2.8 by doxygen1.2.11.1 written by Dimitri van Heesch, © 1997-2001