|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
public abstract class CompoundWordTokenFilterBase
Base class for decomposition token filters.
Field Summary | |
---|---|
static int |
DEFAULT_MAX_SUBWORD_SIZE
The default for maximal length of subwords that get propagated to the output of this filter |
static int |
DEFAULT_MIN_SUBWORD_SIZE
The default for minimal length of subwords that get propagated to the output of this filter |
static int |
DEFAULT_MIN_WORD_SIZE
The default for minimal word length that gets decomposed |
protected CharArraySet |
dictionary
|
protected int |
maxSubwordSize
|
protected int |
minSubwordSize
|
protected int |
minWordSize
|
protected boolean |
onlyLongestMatch
|
protected java.util.LinkedList |
tokens
|
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
---|
input |
Constructor Summary | |
---|---|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
Method Summary | |
---|---|
protected static void |
addAllLowerCase(java.util.Set target,
java.util.Collection col)
|
protected Token |
createToken(int offset,
int length,
Token prototype)
|
protected void |
decompose(Token token)
|
protected abstract void |
decomposeInternal(Token token)
|
static java.util.Set |
makeDictionary(java.lang.String[] dictionary)
Create a set of words from an array The resulting Set does case insensitive matching TODO We should look for a faster dictionary lookup approach. |
protected static char[] |
makeLowerCaseCopy(char[] buffer)
|
Token |
next(Token reusableToken)
Returns the next token in the stream, or null at EOS. |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
---|
close, reset |
Methods inherited from class org.apache.lucene.analysis.TokenStream |
---|
next |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_MIN_WORD_SIZE
public static final int DEFAULT_MIN_SUBWORD_SIZE
public static final int DEFAULT_MAX_SUBWORD_SIZE
protected final CharArraySet dictionary
protected final java.util.LinkedList tokens
protected final int minWordSize
protected final int minSubwordSize
protected final int maxSubwordSize
protected final boolean onlyLongestMatch
Constructor Detail |
---|
protected CompoundWordTokenFilterBase(TokenStream input, java.lang.String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
protected CompoundWordTokenFilterBase(TokenStream input, java.lang.String[] dictionary, boolean onlyLongestMatch)
protected CompoundWordTokenFilterBase(TokenStream input, java.util.Set dictionary, boolean onlyLongestMatch)
protected CompoundWordTokenFilterBase(TokenStream input, java.lang.String[] dictionary)
protected CompoundWordTokenFilterBase(TokenStream input, java.util.Set dictionary)
protected CompoundWordTokenFilterBase(TokenStream input, java.util.Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Method Detail |
---|
public static final java.util.Set makeDictionary(java.lang.String[] dictionary)
dictionary
-
public Token next(Token reusableToken) throws java.io.IOException
TokenStream
This implicitly defines a "contract" between consumers (callers of this method) and producers (implementations of this method that are the source for tokens):
Token.clear()
before setting the fields in it & returning itTokenFilter
is considered a consumer.
next
in class TokenStream
reusableToken
- a Token that may or may not be used to
return; this parameter should never be null (the callee
is not required to check for null before using it, but it is a
good idea to assert that it is not null.)
java.io.IOException
protected static final void addAllLowerCase(java.util.Set target, java.util.Collection col)
protected static char[] makeLowerCaseCopy(char[] buffer)
protected final Token createToken(int offset, int length, Token prototype)
protected void decompose(Token token)
protected abstract void decomposeInternal(Token token)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |