Public Methods |
| CanonicalIterator (const UnicodeString &source, UErrorCode &status) |
| Construct a CanonicalIterator object. More...
|
| ~CanonicalIterator () |
| Destructor Cleans pieces. More...
|
UnicodeString | getSource () |
| Gets the NFD form of the current source we are iterating over. More...
|
void | reset () |
| Resets the iterator so that one can start again from the beginning. More...
|
UnicodeString | next () |
| Get the next canonically equivalent string. More...
|
void | setSource (const UnicodeString &newSource, UErrorCode &status) |
| Set a new source for this iterator. More...
|
virtual UClassID | getDynamicClassID () const |
| ICU "poor man's RTTI", returns a UClassID for the actual class. More...
|
Static Public Methods |
void | permute (UnicodeString &source, UBool skipZeros, Hashtable *result, UErrorCode &status) |
| Dumb recursive implementation of permutation. More...
|
UClassID | getStaticClassID () |
| ICU "poor man's RTTI", returns a UClassID for this class. More...
|
Private Methods |
| CanonicalIterator () |
| CanonicalIterator (const CanonicalIterator &other) |
| Copy constructor. More...
|
CanonicalIterator & | operator= (const CanonicalIterator &other) |
| Assignment operator. More...
|
UnicodeString * | getEquivalents (const UnicodeString &segment, int32_t &result_len, UErrorCode &status) |
Hashtable * | getEquivalents2 (const UChar *segment, int32_t segLen, UErrorCode &status) |
Hashtable * | extract (UChar32 comp, const UChar *segment, int32_t segLen, int32_t segmentPos, UErrorCode &status) |
| See if the decomposition of cp2 is at segment starting at segmentPos (with canonical rearrangment!) If so, take the remainder, and return the equivalents. More...
|
void | cleanPieces () |
Private Attributes |
UnicodeString | source |
UBool | done |
UnicodeString ** | pieces |
int32_t | pieces_length |
int32_t * | pieces_lengths |
int32_t * | current |
int32_t | current_length |
UnicodeString | buffer |
For example, here are some sample results: Results for: {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 1: \u0041\u030A\u0064\u0307\u0327 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 2: \u0041\u030A\u0064\u0327\u0307 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 3: \u0041\u030A\u1E0B\u0327 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 4: \u0041\u030A\u1E11\u0307 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE} 5: \u00C5\u0064\u0307\u0327 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 6: \u00C5\u0064\u0327\u0307 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 7: \u00C5\u1E0B\u0327 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 8: \u00C5\u1E11\u0307 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE} 9: \u212B\u0064\u0307\u0327 = {ANGSTROM SIGN}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 10: \u212B\u0064\u0327\u0307 = {ANGSTROM SIGN}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 11: \u212B\u1E0B\u0327 = {ANGSTROM SIGN}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 12: \u212B\u1E11\u0307 = {ANGSTROM SIGN}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE}
Note: the code is intended for use with small strings, and is not suitable for larger ones, since it has not been optimized for that situation. Note, CanonicalIterator is not intended to be subclassed.