org.apache.lucene.analysis
Class CharTokenizer
public abstract class CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
protected abstract boolean | isTokenChar(char c) - Returns true iff a character should be included in a token.
|
Token | next() - Returns the next token in the stream, or null at EOS.
|
protected char | normalize(char c) - Called on each token character to normalize it before it is added to the
token.
|
CharTokenizer
public CharTokenizer(Reader input)
isTokenChar
protected abstract boolean isTokenChar(char c)
Returns true iff a character should be included in a token. This
tokenizer generates as tokens adjacent sequences of characters which
satisfy this predicate. Characters for which this is false are used to
define token boundaries and are not included in tokens.
next
public final Token next()
throws IOException
Returns the next token in the stream, or null at EOS.
- next in interface TokenStream
normalize
protected char normalize(char c)
Called on each token character to normalize it before it is added to the
token. The default implementation does nothing. Subclasses may use this
to, e.g., lowercase tokens.
Copyright © 2000-2006 Apache Software Foundation. All Rights Reserved.