Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members   Search  

RegexMatcher Class Reference

class RegexMatcher bundles together a reular expression pattern and input text to which the expression can be applied. More...

#include <regex.h>

Inheritance diagram for RegexMatcher::

UObject UMemory List of all members.

Public Methods

 RegexMatcher (const UnicodeString &regexp, uint32_t flags, UErrorCode &status)
 Construct a RegexMatcher for a regular expression. More...

 RegexMatcher (const UnicodeString &regexp, const UnicodeString &input, uint32_t flags, UErrorCode &status)
 Construct a RegexMatcher for a regular expression. More...

virtual ~RegexMatcher ()
 Destructor. More...

virtual UBool matches (UErrorCode &status)
 Attempts to match the entire input string against the pattern. More...

virtual UBool matches (int32_t startIndex, UErrorCode &status)
 Attempts to match the input string, beginning at startIndex, against the pattern. More...

virtual UBool lookingAt (UErrorCode &status)
 Attempts to match the input string, starting from the beginning, against the pattern. More...

virtual UBool lookingAt (int32_t startIndex, UErrorCode &status)
 Attempts to match the input string, starting from the specified index, against the pattern. More...

virtual UBool find ()
 Find the next pattern match in the input string. More...

virtual UBool find (int32_t start, UErrorCode &status)
 Resets this RegexMatcher and then attempts to find the next substring of the input string that matches the pattern, starting at the specified index. More...

virtual UnicodeString group (UErrorCode &status) const
 Returns a string containing the text matched by the previous match. More...

virtual UnicodeString group (int32_t groupNum, UErrorCode &status) const
 Returns a string containing the text captured by the given group during the previous match operation. More...

virtual int32_t groupCount () const
 Returns the number of capturing groups in this matcher's pattern. More...

virtual int32_t start (UErrorCode &status) const
 Returns the index in the input string of the start of the text matched during the previous match operation. More...

virtual int32_t start (int group, UErrorCode &status) const
 Returns the index in the input string of the start of the text matched by the specified capture group during the previous match operation. More...

virtual int32_t end (UErrorCode &status) const
 Returns the index in the input string of the character following the text matched during the previous match operation. More...

virtual int32_t end (int group, UErrorCode &status) const
 Returns the index in the input string of the character following the text matched by the specified capture group during the previous match operation. More...

virtual UBool touchedEnd ()
 Return TRUE of the most recent attempted match or match touched the end of the input string. More...

virtual RegexMatcher & reset ()
 Resets this matcher. More...

virtual RegexMatcher & reset (int32_t index, UErrorCode &status)
 Resets this matcher, and set the current input position. More...

virtual RegexMatcher & reset (const UnicodeString &input)
 Resets this matcher with a new input string. More...

virtual const UnicodeStringinput () const
 Returns the input string being matched. More...

virtual const RegexPatternpattern () const
 Returns the pattern that is interpreted by this matcher. More...

virtual UnicodeString replaceAll (const UnicodeString &replacement, UErrorCode &status)
 Replaces every substring of the input that matches the pattern with the given replacement string. More...

virtual UnicodeString replaceFirst (const UnicodeString &replacement, UErrorCode &status)
 Replaces the first substring of the input that matches the pattern with the replacement string. More...

virtual RegexMatcher & appendReplacement (UnicodeString &dest, const UnicodeString &replacement, UErrorCode &status)
 Implements a replace operation intended to be used as part of an incremental find-and-replace. More...

virtual UnicodeStringappendTail (UnicodeString &dest)
 As the final step in a find-and-replace operation, append the remainder of the input string, starting at the position following the last match, to the destination string. More...

virtual int32_t split (const UnicodeString &input, UnicodeString dest[], int32_t destCapacity, UErrorCode &status)
 Split a string into fields. More...

void setTrace (UBool state)
 setTrace Debug function, enable/disable tracing of the matching engine. More...

virtual UClassID getDynamicClassID () const
 ICU "poor man's RTTI", returns a UClassID for the actual class. More...


Static Public Methods

UClassID getStaticClassID ()
 ICU "poor man's RTTI", returns a UClassID for this class. More...


Private Methods

 RegexMatcher ()
 RegexMatcher (const RegexPattern *pat)
 RegexMatcher (const RegexMatcher &other)
RegexMatcher & operator= (const RegexMatcher &rhs)
void MatchAt (int32_t startIdx, UErrorCode &status)
void backTrack (int32_t &inputIdx, int32_t &patIdx)
UBool isWordBoundary (int32_t pos)
UBool isUWordBoundary (int32_t pos)
REStackFrame * resetStack ()
REStackFrame * StateSave (REStackFrame *fp, int32_t savePatIdx, int32_t frameSize, UErrorCode &status)

Private Attributes

const RegexPatternfPattern
RegexPatternfPatternOwned
const UnicodeStringfInput
UBool fMatch
int32_t fMatchStart
int32_t fMatchEnd
int32_t fLastMatchEnd
UVector32 * fStack
REStackFrame * fFrame
int32_t * fData
int32_t fSmallData [8]
UBool fTraceDebug
UErrorCode fDeferredStatus
UBool fTouchedEnd
RuleBasedBreakIteratorfWordBreakItr

Friends

class RegexPattern

Detailed Description

class RegexMatcher bundles together a reular expression pattern and input text to which the expression can be applied.

It includes methods for testing for matches, and for find and replace operations.

Class RegexMatcher is not intended to be subclassed.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

Definition at line 439 of file regex.h.


Constructor & Destructor Documentation

RegexMatcher::RegexMatcher const UnicodeString   regexp,
uint32_t    flags,
UErrorCode   status
 

Construct a RegexMatcher for a regular expression.

This is a convenience method that avoids the need to explicitly create a RegexPattern object. Note that if several RegexMatchers need to be created for the same expression, it will be more efficient to separately create and cache a RegexPattern object, and use its matcher() method to create the RegexMatcher objects.

Parameters:
regexp  The Regular Expression to be compiled.
flags  Regular expression options, such as case insensitive matching.
See also:
UREGEX_CASE_INSENSITIVE
Parameters:
status  Any errors are reported by setting this UErrorCode variable.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

RegexMatcher::RegexMatcher const UnicodeString   regexp,
const UnicodeString   input,
uint32_t    flags,
UErrorCode   status
 

Construct a RegexMatcher for a regular expression.

This is a convenience method that avoids the need to explicitly create a RegexPattern object. Note that if several RegexMatchers need to be created for the same expression, it will be more efficient to separately create and cache a RegexPattern object, and use its matcher() method to create the RegexMatcher objects.

Parameters:
regexp  The Regular Expression to be compiled.
input  The string to match
flags  Regular expression options, such as case insensitive matching.
See also:
UREGEX_CASE_INSENSITIVE
Parameters:
status  Any errors are reported by setting this UErrorCode variable.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

virtual RegexMatcher::~RegexMatcher   [virtual]
 

Destructor.

Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

RegexMatcher::RegexMatcher   [private]
 

RegexMatcher::RegexMatcher const RegexPattern   pat [private]
 

RegexMatcher::RegexMatcher const RegexMatcher &    other [private]
 


Member Function Documentation

void RegexMatcher::MatchAt int32_t    startIdx,
UErrorCode   status
[private]
 

REStackFrame* RegexMatcher::StateSave REStackFrame *    fp,
int32_t    savePatIdx,
int32_t    frameSize,
UErrorCode   status
[inline, private]
 

virtual RegexMatcher& RegexMatcher::appendReplacement UnicodeString   dest,
const UnicodeString   replacement,
UErrorCode   status
[virtual]
 

Implements a replace operation intended to be used as part of an incremental find-and-replace.

The input string, starting from the end of the previous match and ending at the start of the current match, is appended to the destination string. Then the replacement string is appended to the output string, including handling any substitutions of captured text.

For simple, prepackaged, non-incremental find-and-replace operations, see replaceFirst() or replaceAll().

Parameters:
dest  A UnicodeString to which the results of the find-and-replace are appended.
replacement  A UnicodeString that provides the text to be substitured for the input text that matched the regexp pattern. The replacement text may contain references to captured text from the input.
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed, and U_INDEX_OUTOFBOUNDS_ERROR if the replacement text specifies a capture group that does not exist in the pattern.
Returns:
this RegexMatcher
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UnicodeString& RegexMatcher::appendTail UnicodeString   dest [virtual]
 

As the final step in a find-and-replace operation, append the remainder of the input string, starting at the position following the last match, to the destination string.

appendTail() is intended to be invoked after one or more invocations of the RegexMatcher::appendReplacement().

Parameters:
dest  A UnicodeString to which the results of the find-and-replace are appended.
Returns:
the destination string.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

void RegexMatcher::backTrack int32_t &    inputIdx,
int32_t &    patIdx
[inline, private]
 

virtual int32_t RegexMatcher::end int    group,
UErrorCode   status
const [virtual]
 

Returns the index in the input string of the character following the text matched by the specified capture group during the previous match operation.

Parameters:
group  the capture group number
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed and U_INDEX_OUTOFBOUNDS_ERROR for a bad capture group number
Returns:
the index of the last character, plus one, of the text captured by the specifed group during the previous match operation. Return -1 if the capture group was not part of the match.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual int32_t RegexMatcher::end UErrorCode   status const [virtual]
 

Returns the index in the input string of the character following the text matched during the previous match operation.

Parameters:
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed.
Returns:
the index of the last character matched, plus one.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::find int32_t    start,
UErrorCode   status
[virtual]
 

Resets this RegexMatcher and then attempts to find the next substring of the input string that matches the pattern, starting at the specified index.

Parameters:
start  the position in the input string to begin the search
status  A reference to a UErrorCode to receive any errors.
Returns:
TRUE if a match is found.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::find   [virtual]
 

Find the next pattern match in the input string.

The find begins searching the input at the location following the end of the previous match, or at the start of the string if there is no previous match. If a match is found, start(), end() and group() will provide more information regarding the match.

Note that if the input string is changed by the application, use find(startPos, status) instead of find(), because the saved starting position may not be valid with the altered input string.

Returns:
TRUE if a match is found.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UClassID RegexMatcher::getDynamicClassID void    const [virtual]
 

ICU "poor man's RTTI", returns a UClassID for the actual class.

Stable:
ICU 2.2

Reimplemented from UObject.

UClassID RegexMatcher::getStaticClassID void    [static]
 

ICU "poor man's RTTI", returns a UClassID for this class.

Stable:
ICU 2.2

virtual UnicodeString RegexMatcher::group int32_t    groupNum,
UErrorCode   status
const [virtual]
 

Returns a string containing the text captured by the given group during the previous match operation.

Group(0) is the entire match.

Parameters:
groupNum  the capture group number
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed and U_INDEX_OUTOFBOUNDS_ERROR for a bad capture group number.
Returns:
the captured text
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UnicodeString RegexMatcher::group UErrorCode   status const [virtual]
 

Returns a string containing the text matched by the previous match.

If the pattern can match an empty string, an empty string may be returned.

Parameters:
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed.
Returns:
a string containing the matched input text.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual int32_t RegexMatcher::groupCount   const [virtual]
 

Returns the number of capturing groups in this matcher's pattern.

Returns:
the number of capture groups
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual const UnicodeString& RegexMatcher::input   const [virtual]
 

Returns the input string being matched.

The returned string is not a copy, but the live input string. It should not be altered or deleted.

Returns:
the input string
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

UBool RegexMatcher::isUWordBoundary int32_t    pos [private]
 

UBool RegexMatcher::isWordBoundary int32_t    pos [private]
 

virtual UBool RegexMatcher::lookingAt int32_t    startIndex,
UErrorCode   status
[virtual]
 

Attempts to match the input string, starting from the specified index, against the pattern.

The match may be of any length, and is not required to extend to the end of the input string. Contrast with match().

If the match succeeds then more information can be obtained via the start(), end(), and group() functions.

Parameters:
startIndex  The input string index at which to begin matching.
status  A reference to a UErrorCode to receive any errors.
Returns:
TRUE if there is a match.
Draft:
This API has been introduced in ICU 2.8. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::lookingAt UErrorCode   status [virtual]
 

Attempts to match the input string, starting from the beginning, against the pattern.

Like the matches() method, this function always starts at the beginning of the input string; unlike that function, it does not require that the entire input string be matched.

If the match succeeds then more information can be obtained via the start(), end(), and group() functions.

Parameters:
status  A reference to a UErrorCode to receive any errors.
Returns:
TRUE if there is a match at the start of the input string.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::matches int32_t    startIndex,
UErrorCode   status
[virtual]
 

Attempts to match the input string, beginning at startIndex, against the pattern.

The match must extend to the end of the input string.

Parameters:
startIndex  The input string index at which to begin matching.
status  A reference to a UErrorCode to receive any errors.
Returns:
TRUE if there is a match
Draft:
This API has been introduced in ICU 2.8. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::matches UErrorCode   status [virtual]
 

Attempts to match the entire input string against the pattern.

Parameters:
status  A reference to a UErrorCode to receive any errors.
Returns:
TRUE if there is a match
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

RegexMatcher& RegexMatcher::operator= const RegexMatcher &    rhs [private]
 

virtual const RegexPattern& RegexMatcher::pattern   const [virtual]
 

Returns the pattern that is interpreted by this matcher.

Returns:
the RegexPattern for this RegexMatcher
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UnicodeString RegexMatcher::replaceAll const UnicodeString   replacement,
UErrorCode   status
[virtual]
 

Replaces every substring of the input that matches the pattern with the given replacement string.

This is a convenience function that provides a complete find-and-replace-all operation.

This method first resets this matcher. It then scans the input string looking for matches of the pattern. Input that is not part of any match is left unchanged; each match is replaced in the result by the replacement string. The replacement string may contain references to capture groups.

Parameters:
replacement  a string containing the replacement text.
status  a reference to a UErrorCode to receive any errors.
Returns:
a string containing the results of the find and replace.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UnicodeString RegexMatcher::replaceFirst const UnicodeString   replacement,
UErrorCode   status
[virtual]
 

Replaces the first substring of the input that matches the pattern with the replacement string.

This is a convenience function that provides a complete find-and-replace operation.

This function first resets this RegexMatcher. It then scans the input string looking for a match of the pattern. Input that is not part of the match is appended directly to the result string; the match is replaced in the result by the replacement string. The replacement string may contain references to captured groups.

The state of the matcher (the position at which a subsequent find() would begin) after completing a replaceFirst() is not specified. The RegexMatcher should be reset before doing additional find() operations.

Parameters:
replacement  a string containing the replacement text.
status  a reference to a UErrorCode to receive any errors.
Returns:
a string containing the results of the find and replace.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual RegexMatcher& RegexMatcher::reset const UnicodeString   input [virtual]
 

Resets this matcher with a new input string.

This allows instances of RegexMatcher to be reused, which is more efficient than creating a new RegexMatcher for each input string to be processed.

Returns:
this RegexMatcher.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual RegexMatcher& RegexMatcher::reset int32_t    index,
UErrorCode   status
[virtual]
 

Resets this matcher, and set the current input position.

The effect is to remove any memory of previous matches, and to cause subsequent find() operations to begin at the specified position in the input string.

Returns:
this RegexMatcher.
Draft:
This API has been introduced in ICU 2.8. It is still in draft state and may be modified in a future release.

virtual RegexMatcher& RegexMatcher::reset void    [virtual]
 

Resets this matcher.

The effect is to remove any memory of previous matches, and to cause subsequent find() operations to begin at the beginning of the input string.

Returns:
this RegexMatcher.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

REStackFrame* RegexMatcher::resetStack   [private]
 

void RegexMatcher::setTrace UBool    state
 

setTrace Debug function, enable/disable tracing of the matching engine.

For internal ICU development use only. DO NO USE!!!!

Internal:
For internal use only.

virtual int32_t RegexMatcher::split const UnicodeString   input,
UnicodeString    dest[],
int32_t    destCapacity,
UErrorCode   status
[virtual]
 

Split a string into fields.

Somewhat like split() from Perl. The pattern matches identify delimiters that separate the input into fields. The input data between the matches becomes the fields themselves.

Parameters:
input  The string to be split into fields. The field delimiters match the pattern (in the "this" object). This matcher will be reset to this input string.
dest  An array of UnicodeStrings to receive the results of the split. This is an array of actual UnicodeString objects, not an array of pointers to strings. Local (stack based) arrays can work well here.
destCapacity  The number of elements in the destination array. If the number of fields found is less than destCapacity, the extra strings in the destination array are not altered. If the number of destination strings is less than the number of fields, the trailing part of the input string, including any field delimiters, is placed in the last destination string.
status  A reference to a UErrorCode to receive any errors.
Returns:
The number of fields into which the input string was split.
Draft:
This API has been introduced in ICU 2.6. It is still in draft state and may be modified in a future release.

virtual int32_t RegexMatcher::start int    group,
UErrorCode   status
const [virtual]
 

Returns the index in the input string of the start of the text matched by the specified capture group during the previous match operation.

Return -1 if the capture group exists in the pattern, but was not part of the last match.

Parameters:
group  the capture group number
status  A reference to a UErrorCode to receive any errors. Possible errors are U_REGEX_INVALID_STATE if no match has been attempted or the last match failed, and U_INDEX_OUTOFBOUNDS_ERROR for a bad capture group number
Returns:
the start position of substring matched by the specified group.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual int32_t RegexMatcher::start UErrorCode   status const [virtual]
 

Returns the index in the input string of the start of the text matched during the previous match operation.

Parameters:
status  a reference to a UErrorCode to receive any errors.
Returns:
The position in the input string of the start of the last match.
Draft:
This API has been introduced in ICU 2.4. It is still in draft state and may be modified in a future release.

virtual UBool RegexMatcher::touchedEnd   [virtual]
 

Return TRUE of the most recent attempted match or match touched the end of the input string.

For failed matches, this normally means thta some amount of additional input, appended to the existing input string, could have resulted in a match

Returns:
True if the most recently attempted match reached the end of the input string.
Draft:
This API has been introduced in ICU 2.8. It is still in draft state and may be modified in a future release.


Friends And Related Function Documentation

friend class RegexPattern [friend]
 

Definition at line 860 of file regex.h.


Member Data Documentation

int32_t* RegexMatcher::fData [private]
 

Definition at line 891 of file regex.h.

UErrorCode RegexMatcher::fDeferredStatus [private]
 

Definition at line 896 of file regex.h.

REStackFrame* RegexMatcher::fFrame [private]
 

Definition at line 887 of file regex.h.

const UnicodeString* RegexMatcher::fInput [private]
 

Definition at line 879 of file regex.h.

int32_t RegexMatcher::fLastMatchEnd [private]
 

Definition at line 884 of file regex.h.

UBool RegexMatcher::fMatch [private]
 

Definition at line 881 of file regex.h.

int32_t RegexMatcher::fMatchEnd [private]
 

Definition at line 883 of file regex.h.

int32_t RegexMatcher::fMatchStart [private]
 

Definition at line 882 of file regex.h.

const RegexPattern* RegexMatcher::fPattern [private]
 

Definition at line 876 of file regex.h.

RegexPattern* RegexMatcher::fPatternOwned [private]
 

Definition at line 877 of file regex.h.

int32_t RegexMatcher::fSmallData[8] [private]
 

Definition at line 892 of file regex.h.

UVector32* RegexMatcher::fStack [private]
 

Definition at line 886 of file regex.h.

UBool RegexMatcher::fTouchedEnd [private]
 

Definition at line 899 of file regex.h.

UBool RegexMatcher::fTraceDebug [private]
 

Definition at line 894 of file regex.h.

RuleBasedBreakIterator* RegexMatcher::fWordBreakItr [private]
 

Definition at line 902 of file regex.h.


The documentation for this class was generated from the following file:
Generated on Mon Nov 24 14:36:45 2003 for ICU 2.8 by doxygen1.2.11.1 written by Dimitri van Heesch, © 1997-2001