Frames | No Frames |
Classes derived from org.apache.lucene.analysis.Tokenizer | |
class | A grammar-based tokenizer constructed with JavaCC. |
Classes derived from org.apache.lucene.analysis.Tokenizer | |
class | A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". |
Classes derived from org.apache.lucene.analysis.Tokenizer | |
class | Title: ChineseTokenizer
Description: Extract tokens from the Stream using Character.getType()
Rule: A Chinese character as a single token
Copyright: Copyright (c) 2001
Company:
The difference between thr ChineseTokenizer and the
CJKTokenizer (id=23545) is that they have different
token parsing logic. |
Classes derived from org.apache.lucene.analysis.Tokenizer | |
class | CJKTokenizer was modified from StopTokenizer which does a decent job for
most European languages. |
Classes derived from org.apache.lucene.analysis.Tokenizer | |
class | An abstract base class for simple, character-oriented tokenizers. |
class | Emits the entire input as a single token. |
class | A LetterTokenizer is a tokenizer that divides text at non-letters. |
class | LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together. |
class | A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |