Class CharMatcher

    • Field Detail

      • WHITESPACE

        @Deprecated
        public static final CharMatcher WHITESPACE
        Deprecated.
        Use whitespace() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here. This is not the same definition used by other Java APIs. (See a comparison of several definitions of "whitespace".)

        Note: as the Unicode definition evolves, we will modify this constant to keep it up to date.

      • BREAKING_WHITESPACE

        @Deprecated
        public static final CharMatcher BREAKING_WHITESPACE
        Deprecated.
        Use breakingWhitespace() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes). See whitespace() for a discussion of that term.
        Since:
        2.0
      • ASCII

        @Deprecated
        public static final CharMatcher ASCII
        Deprecated.
        Use ascii() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is ASCII, meaning that its code point is less than 128.
      • DIGIT

        @Deprecated
        public static final CharMatcher DIGIT
        Deprecated.
        Use digit() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is a digit according to Unicode. If you only care to match ASCII digits, you can use inRange('0', '9').
      • JAVA_DIGIT

        @Deprecated
        public static final CharMatcher JAVA_DIGIT
        Deprecated.
        Use javaDigit() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is a digit according to Java's definition. If you only care to match ASCII digits, you can use inRange('0', '9').
      • JAVA_LETTER

        @Deprecated
        public static final CharMatcher JAVA_LETTER
        Deprecated.
        Use javaLetter() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is a letter according to Java's definition. If you only care to match letters of the Latin alphabet, you can use inRange('a', 'z').or(inRange('A', 'Z')).
      • JAVA_LETTER_OR_DIGIT

        @Deprecated
        public static final CharMatcher JAVA_LETTER_OR_DIGIT
        Deprecated.
        Use javaLetterOrDigit() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is a letter or digit according to Java's definition.
      • JAVA_UPPER_CASE

        @Deprecated
        public static final CharMatcher JAVA_UPPER_CASE
        Deprecated.
        Use javaUpperCase() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is upper case according to Java's definition.
      • JAVA_LOWER_CASE

        @Deprecated
        public static final CharMatcher JAVA_LOWER_CASE
        Deprecated.
        Use javaLowerCase() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is lower case according to Java's definition.
      • JAVA_ISO_CONTROL

        @Deprecated
        public static final CharMatcher JAVA_ISO_CONTROL
        Deprecated.
        Use javaIsoControl() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is an ISO control character as specified by Character.isISOControl(char).
      • INVISIBLE

        @Deprecated
        public static final CharMatcher INVISIBLE
        Deprecated.
        Use invisible() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.
      • SINGLE_WIDTH

        @Deprecated
        public static final CharMatcher SINGLE_WIDTH
        Deprecated.
        Use singleWidth() instead. This constant is scheduled to be removed in June 2018.
        Determines whether a character is single-width (not double-width). When in doubt, this matcher errs on the side of returning false (that is, it tends to assume a character is double-width).

        Note: as the reference file evolves, we will modify this constant to keep it up to date.

      • ANY

        @Deprecated
        public static final CharMatcher ANY
        Deprecated.
        Use any() instead. This constant is scheduled to be removed in June 2018.
        Matches any character.
      • NONE

        @Deprecated
        public static final CharMatcher NONE
        Deprecated.
        Use none() instead. This constant is scheduled to be removed in June 2018.
        Matches no characters.
    • Constructor Detail

      • CharMatcher

        protected CharMatcher()
        Constructor for use by subclasses. When subclassing, you may want to override toString() to provide a useful description.
    • Method Detail

      • any

        public static CharMatcher any()
        Matches any character.
        Since:
        19.0 (since 1.0 as constant ANY)
      • none

        public static CharMatcher none()
        Matches no characters.
        Since:
        19.0 (since 1.0 as constant NONE)
      • whitespace

        public static CharMatcher whitespace()
        Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here. This is not the same definition used by other Java APIs. (See a comparison of several definitions of "whitespace".)

        Note: as the Unicode definition evolves, we will modify this matcher to keep it up to date.

        Since:
        19.0 (since 1.0 as constant WHITESPACE)
      • breakingWhitespace

        public static CharMatcher breakingWhitespace()
        Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes). See whitespace() for a discussion of that term.
        Since:
        19.0 (since 2.0 as constant BREAKING_WHITESPACE)
      • ascii

        public static CharMatcher ascii()
        Determines whether a character is ASCII, meaning that its code point is less than 128.
        Since:
        19.0 (since 1.0 as constant ASCII)
      • digit

        public static CharMatcher digit()
        Determines whether a character is a digit according to Unicode. If you only care to match ASCII digits, you can use inRange('0', '9').
        Since:
        19.0 (since 1.0 as constant DIGIT)
      • javaDigit

        public static CharMatcher javaDigit()
        Determines whether a character is a digit according to Java's definition. If you only care to match ASCII digits, you can use inRange('0', '9').
        Since:
        19.0 (since 1.0 as constant JAVA_DIGIT)
      • javaLetter

        public static CharMatcher javaLetter()
        Determines whether a character is a letter according to Java's definition. If you only care to match letters of the Latin alphabet, you can use inRange('a', 'z').or(inRange('A', 'Z')).
        Since:
        19.0 (since 1.0 as constant JAVA_LETTER)
      • javaLetterOrDigit

        public static CharMatcher javaLetterOrDigit()
        Determines whether a character is a letter or digit according to Java's definition.
        Since:
        19.0 (since 1.0 as constant JAVA_LETTER_OR_DIGIT).
      • javaUpperCase

        public static CharMatcher javaUpperCase()
        Determines whether a character is upper case according to Java's definition.
        Since:
        19.0 (since 1.0 as constant JAVA_UPPER_CASE)
      • javaLowerCase

        public static CharMatcher javaLowerCase()
        Determines whether a character is lower case according to Java's definition.
        Since:
        19.0 (since 1.0 as constant JAVA_LOWER_CASE)
      • javaIsoControl

        public static CharMatcher javaIsoControl()
        Determines whether a character is an ISO control character as specified by Character.isISOControl(char).
        Since:
        19.0 (since 1.0 as constant JAVA_ISO_CONTROL)
      • invisible

        public static CharMatcher invisible()
        Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.
        Since:
        19.0 (since 1.0 as constant INVISIBLE)
      • singleWidth

        public static CharMatcher singleWidth()
        Determines whether a character is single-width (not double-width). When in doubt, this matcher errs on the side of returning false (that is, it tends to assume a character is double-width).

        Note: as the reference file evolves, we will modify this matcher to keep it up to date.

        Since:
        19.0 (since 1.0 as constant SINGLE_WIDTH)
      • is

        public static CharMatcher is​(char match)
        Returns a char matcher that matches only one specified character.
      • isNot

        public static CharMatcher isNot​(char match)
        Returns a char matcher that matches any character except the one specified.

        To negate another CharMatcher, use negate().

      • anyOf

        public static CharMatcher anyOf​(java.lang.CharSequence sequence)
        Returns a char matcher that matches any character present in the given character sequence.
      • noneOf

        public static CharMatcher noneOf​(java.lang.CharSequence sequence)
        Returns a char matcher that matches any character not present in the given character sequence.
      • inRange

        public static CharMatcher inRange​(char startInclusive,
                                          char endInclusive)
        Returns a char matcher that matches any character in a given range (both endpoints are inclusive). For example, to match any lowercase letter of the English alphabet, use CharMatcher.inRange('a', 'z').
        Throws:
        java.lang.IllegalArgumentException - if endInclusive < startInclusive
      • forPredicate

        public static CharMatcher forPredicate​(Predicate<? super java.lang.Character> predicate)
        Returns a matcher with identical behavior to the given Character-based predicate, but which operates on primitive char instances instead.
      • matches

        public abstract boolean matches​(char c)
        Determines a true or false value for the given character.
      • negate

        public CharMatcher negate()
        Returns a matcher that matches any character not matched by this matcher.
      • and

        public CharMatcher and​(CharMatcher other)
        Returns a matcher that matches any character matched by both this matcher and other.
      • or

        public CharMatcher or​(CharMatcher other)
        Returns a matcher that matches any character matched by either this matcher or other.
      • precomputed

        public CharMatcher precomputed()
        Returns a char matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary. Precomputation takes time and is likely to be worthwhile only if the precomputed matcher is queried many thousands of times.

        This method has no effect (returns this) when called in GWT: it's unclear whether a precomputed matcher is faster, but it certainly consumes more memory, which doesn't seem like a worthwhile tradeoff in a browser.

      • precomputedInternal

        @GwtIncompatible
        CharMatcher precomputedInternal()
        This is the actual implementation of precomputed(), but we bounce calls through a method on Platform so that we can have different behavior in GWT.

        This implementation tries to be smart in a number of ways. It recognizes cases where the negation is cheaper to precompute than the matcher itself; it tries to build small hash tables for matchers that only match a few characters, and so on. In the worst-case scenario, it constructs an eight-kilobyte bit array and queries that. In many situations this produces a matcher which is faster to query than the original.

      • precomputedPositive

        @GwtIncompatible
        private static CharMatcher precomputedPositive​(int totalCharacters,
                                                       java.util.BitSet table,
                                                       java.lang.String description)
        Helper method for precomputedInternal() that doesn't test if the negation is cheaper.
      • isSmall

        @GwtIncompatible
        private static boolean isSmall​(int totalCharacters,
                                       int tableLength)
      • setBits

        @GwtIncompatible
        void setBits​(java.util.BitSet table)
        Sets bits in table matched by this matcher.
      • matchesAnyOf

        public boolean matchesAnyOf​(java.lang.CharSequence sequence)
        Returns true if a character sequence contains at least one matching character. Equivalent to !matchesNoneOf(sequence).

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns true or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches at least one character in the sequence
        Since:
        8.0
      • matchesAllOf

        public boolean matchesAllOf​(java.lang.CharSequence sequence)
        Returns true if a character sequence contains only matching characters.

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns false or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches every character in the sequence, including when the sequence is empty
      • matchesNoneOf

        public boolean matchesNoneOf​(java.lang.CharSequence sequence)
        Returns true if a character sequence contains no matching characters. Equivalent to !matchesAnyOf(sequence).

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns true or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches no characters in the sequence, including when the sequence is empty
      • indexIn

        public int indexIn​(java.lang.CharSequence sequence)
        Returns the index of the first matching character in a character sequence, or -1 if no matching character is present.

        The default implementation iterates over the sequence in forward order calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine from the beginning
        Returns:
        an index, or -1 if no character matches
      • indexIn

        public int indexIn​(java.lang.CharSequence sequence,
                           int start)
        Returns the index of the first matching character in a character sequence, starting from a given position, or -1 if no character matches after that position.

        The default implementation iterates over the sequence in forward order, beginning at start, calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine
        start - the first index to examine; must be nonnegative and no greater than sequence.length()
        Returns:
        the index of the first matching character, guaranteed to be no less than start, or -1 if no character matches
        Throws:
        java.lang.IndexOutOfBoundsException - if start is negative or greater than sequence.length()
      • lastIndexIn

        public int lastIndexIn​(java.lang.CharSequence sequence)
        Returns the index of the last matching character in a character sequence, or -1 if no matching character is present.

        The default implementation iterates over the sequence in reverse order calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine from the end
        Returns:
        an index, or -1 if no character matches
      • countIn

        public int countIn​(java.lang.CharSequence sequence)
        Returns the number of matching characters found in a character sequence.
      • removeFrom

        public java.lang.String removeFrom​(java.lang.CharSequence sequence)
        Returns a string containing all non-matching characters of a character sequence, in order. For example:
           
        
           CharMatcher.is('a').removeFrom("bazaar")
        ... returns "bzr".
      • retainFrom

        public java.lang.String retainFrom​(java.lang.CharSequence sequence)
        Returns a string containing all matching characters of a character sequence, in order. For example:
           
        
           CharMatcher.is('a').retainFrom("bazaar")
        ... returns "aaa".
      • replaceFrom

        public java.lang.String replaceFrom​(java.lang.CharSequence sequence,
                                            char replacement)
        Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement character. For example:
           
        
           CharMatcher.is('a').replaceFrom("radar", 'o')
        ... returns "rodor".

        The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to replace matching characters in
        replacement - the character to append to the result string in place of each matching character in sequence
        Returns:
        the new string
      • replaceFrom

        public java.lang.String replaceFrom​(java.lang.CharSequence sequence,
                                            java.lang.CharSequence replacement)
        Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement sequence. For example:
           
        
           CharMatcher.is('a').replaceFrom("yaha", "oo")
        ... returns "yoohoo".

        Note: If the replacement is a fixed string with only one character, you are better off calling replaceFrom(CharSequence, char) directly.

        Parameters:
        sequence - the character sequence to replace matching characters in
        replacement - the characters to append to the result string in place of each matching character in sequence
        Returns:
        the new string
      • trimFrom

        public java.lang.String trimFrom​(java.lang.CharSequence sequence)
        Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning and from the end of the string. For example:
           
        
           CharMatcher.anyOf("ab").trimFrom("abacatbab")
        ... returns "cat".

        Note that:

           
        
           CharMatcher.inRange('\0', ' ').trimFrom(str)
        ... is equivalent to String.trim().
      • trimLeadingFrom

        public java.lang.String trimLeadingFrom​(java.lang.CharSequence sequence)
        Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning of the string. For example:
         
        
           CharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
        ... returns "catbab".
      • trimTrailingFrom

        public java.lang.String trimTrailingFrom​(java.lang.CharSequence sequence)
        Returns a substring of the input character sequence that omits all characters this matcher matches from the end of the string. For example:
         
        
           CharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
        ... returns "abacat".
      • collapseFrom

        public java.lang.String collapseFrom​(java.lang.CharSequence sequence,
                                             char replacement)
        Returns a string copy of the input character sequence, with each group of consecutive characters that match this matcher replaced by a single replacement character. For example:
           
        
           CharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
        ... returns "b-p-r".

        The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to replace matching groups of characters in
        replacement - the character to append to the result string in place of each group of matching characters in sequence
        Returns:
        the new string
      • trimAndCollapseFrom

        public java.lang.String trimAndCollapseFrom​(java.lang.CharSequence sequence,
                                                    char replacement)
        Collapses groups of matching characters exactly as collapseFrom(java.lang.CharSequence, char) does, except that groups of matching characters at the start or end of the sequence are removed without replacement.
      • finishCollapseFrom

        private java.lang.String finishCollapseFrom​(java.lang.CharSequence sequence,
                                                    int start,
                                                    int end,
                                                    char replacement,
                                                    java.lang.StringBuilder builder,
                                                    boolean inMatchingGroup)
      • apply

        @Deprecated
        public boolean apply​(java.lang.Character character)
        Deprecated.
        Provided only to satisfy the Predicate interface; use matches(char) instead.
        Description copied from interface: Predicate
        Returns the result of applying this predicate to input (Java 8 users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:
        • Its execution does not cause any observable side effects.
        • The computation is consistent with equals; that is, Objects.equal(a, b) implies that predicate.apply(a) == predicate.apply(b)).
        Specified by:
        apply in interface Predicate<java.lang.Character>
      • toString

        public java.lang.String toString()
        Returns a string representation of this CharMatcher, such as CharMatcher.or(WHITESPACE, JAVA_DIGIT).
        Overrides:
        toString in class java.lang.Object
      • showCharacter

        private static java.lang.String showCharacter​(char c)
        Returns the Java Unicode escape sequence for the given character, in the form "ካ" where "12AB" is the four hexadecimal digits representing the 16 bits of the UTF-16 character.