org.apache.lucene.analysis
Class LetterTokenizer
public class LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's
to say, it defines tokens as maximal strings of adjacent letters, as defined
by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.
protected boolean | isTokenChar(char c) - Collects only characters which satisfy
Character.isLetter(char) .
|
LetterTokenizer
public LetterTokenizer(Reader in)
Construct a new LetterTokenizer.
isTokenChar
protected boolean isTokenChar(char c)
Collects only characters which satisfy
Character.isLetter(char)
.
- isTokenChar in interface CharTokenizer
Copyright © 2000-2005 Apache Software Foundation. All Rights Reserved.