Package org.apache.lucene.analysis

API and code to convert text into indexable tokens.

Class Summary

AnalyzerAn Analyzer builds TokenStreams, which analyze text.
CharTokenizerAn abstract base class for simple, character-oriented tokenizers.
ISOLatin1AccentFilter A filter that replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent.
KeywordAnalyzer "Tokenizes" the entire stream as a single token.
KeywordTokenizer Emits the entire input as a single token.
LengthFilter Removes words that are too long and too short from the stream.
LetterTokenizerA LetterTokenizer is a tokenizer that divides text at non-letters.
LowerCaseFilter Normalizes token text to lower case.
LowerCaseTokenizer LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.
PorterStemFilterTransforms the token stream as per the Porter stemming algorithm.
SimpleAnalyzerAn Analyzer that filters LetterTokenizer with LowerCaseFilter.
StopAnalyzerFilters LetterTokenizer with LowerCaseFilter and StopFilter.
StopFilter Removes stop words from a token stream.
TokenA Token is an occurence of a term from the text of a field.
TokenFilterA TokenFilter is a TokenStream whose input is another token stream.
TokenizerA Tokenizer is a TokenStream whose input is a Reader.
TokenStreamA TokenStream enumerates the sequence of tokens, either from fields of a document or from query text.
WhitespaceAnalyzerAn Analyzer that uses WhitespaceTokenizer.
WhitespaceTokenizerA WhitespaceTokenizer is a tokenizer that divides text at whitespace.
WordlistLoader Loader for text files that represent a list of stopwords.
API and code to convert text into indexable tokens.

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.