Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable tokens.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese and Korean.
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
A grammar-based tokenizer constructed with JavaCC.
|
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
Modifier and Type | Class and Description |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
ISOLatin1AccentFilter
A filter that replaces accented characters in the ISO Latin 1 character set
(ISO-8859-1) by their unaccented equivalent.
|
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long and too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another token stream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
Analyzer.tokenStream(Reader reader)
Deprecated.
use tokenStream(String, Reader) instead.
|
TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
TokenStream |
KeywordAnalyzer.tokenStream(String fieldName,
Reader reader) |
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
TokenStream |
SimpleAnalyzer.tokenStream(String fieldName,
Reader reader) |
TokenStream |
StopAnalyzer.tokenStream(String fieldName,
Reader reader)
Filters LowerCaseTokenizer with StopFilter.
|
TokenStream |
WhitespaceAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
ISOLatin1AccentFilter(TokenStream input) |
LengthFilter(TokenStream in,
int min,
int max)
Build a filter that removes words that are too long or too
short from the text.
|
LowerCaseFilter(TokenStream in) |
PorterStemFilter(TokenStream in) |
StopFilter(TokenStream in,
Hashtable stopTable)
Deprecated.
Use
StopFilter.StopFilter(TokenStream, Set) instead |
StopFilter(TokenStream in,
Hashtable stopTable,
boolean ignoreCase)
Deprecated.
Use
StopFilter.StopFilter(TokenStream, Set) instead |
StopFilter(TokenStream in,
Set stopWords)
Constructs a filter which removes words from the input
TokenStream that are named in the Set.
|
StopFilter(TokenStream input,
Set stopWords,
boolean ignoreCase)
Construct a token stream filtering the given input.
|
StopFilter(TokenStream input,
String[] stopWords)
Construct a token stream filtering the given input.
|
StopFilter(TokenStream in,
String[] stopWords,
boolean ignoreCase)
Constructs a filter which removes words from the input
TokenStream that are named in the array of words.
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
Based on GermanStemFilter
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
BrazilianAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
BrazilianStemFilter(TokenStream in) |
BrazilianStemFilter(TokenStream in,
Hashtable exclusiontable)
Deprecated.
|
BrazilianStemFilter(TokenStream in,
Set exclusiontable) |
Modifier and Type | Class and Description |
---|---|
class |
CJKTokenizer
CJKTokenizer was modified from StopTokenizer which does a decent job for
most European languages.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
CJKAnalyzer.tokenStream(String fieldName,
Reader reader)
get token stream from input
|
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
Title: ChineseFilter
Description: Filter with a stop word table
Rule: No digital is allowed.
|
class |
ChineseTokenizer
Title: ChineseTokenizer
Description: Extract tokens from the Stream using Character.getType()
Rule: A Chinese character as a single token
Copyright: Copyright (c) 2001
Company:
The difference between thr ChineseTokenizer and the
CJKTokenizer (id=23545) is that they have different
token parsing logic.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
ChineseAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
ChineseFilter(TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CzechAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Modifier and Type | Class and Description |
---|---|
class |
GermanStemFilter
A filter that stems German words.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
GermanAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
GermanStemFilter(TokenStream in) |
GermanStemFilter(TokenStream in,
Hashtable exclusiontable)
Deprecated.
|
GermanStemFilter(TokenStream in,
Set exclusionSet)
Builds a GermanStemFilter that uses an exclusiontable.
|
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, analyzing given ("greek") charset.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
GreekAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
GreekLowerCaseFilter(TokenStream in,
char[] charset) |
Modifier and Type | Class and Description |
---|---|
class |
FrenchStemFilter
A filter that stemms french words.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
FrenchAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
FrenchStemFilter(TokenStream in) |
FrenchStemFilter(TokenStream in,
Hashtable exclusiontable)
Deprecated.
|
FrenchStemFilter(TokenStream in,
Set exclusiontable) |
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
A filter that stems Dutch words.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
DutchAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided TextReader.
|
Constructor and Description |
---|
DutchStemFilter(TokenStream _in) |
DutchStemFilter(TokenStream _in,
Set exclusiontable)
Builds a DutchStemFilter that uses an exclusiontable.
|
DutchStemFilter(TokenStream _in,
Set exclusiontable,
Map stemdictionary) |
Modifier and Type | Class and Description |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset".
|
class |
RussianLowerCaseFilter
Normalizes token text to lower case, analyzing given ("russian") charset.
|
class |
RussianStemFilter
A filter that stems Russian words.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
RussianAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
|
Constructor and Description |
---|
RussianLowerCaseFilter(TokenStream in,
char[] charset) |
RussianStemFilter(TokenStream in,
char[] charset) |
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SnowballAnalyzer.tokenStream(String fieldName,
Reader reader)
|
Constructor and Description |
---|
SnowballFilter(TokenStream in,
String name)
Construct the named stemming filter.
|
Modifier and Type | Class and Description |
---|---|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JavaCC.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
StandardAnalyzer.tokenStream(String fieldName,
Reader reader)
|
Constructor and Description |
---|
StandardFilter(TokenStream in)
Construct filtering in.
|
Modifier and Type | Class and Description |
---|---|
class |
SynonymTokenFilter
Injects additional tokens for synonyms of token terms fetched from the
underlying child stream; the child stream must deliver lowercase tokens
for synonyms to be found.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
MemoryIndex.keywordTokenStream(Collection keywords)
Convenience method; Creates and returns a token stream that generates a
token for each keyword in the given collection, "as is", without any
transforming text analysis.
|
TokenStream |
PatternAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a token stream that tokenizes all the text in the given Reader;
This implementation forwards to
tokenStream(String, String) and is
less efficient than tokenStream(String, String) . |
TokenStream |
PatternAnalyzer.tokenStream(String fieldName,
String text)
Creates a token stream that tokenizes the given string into token terms
(aka words).
|
Modifier and Type | Method and Description |
---|---|
void |
MemoryIndex.addField(String fieldName,
TokenStream stream)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
Constructor and Description |
---|
SynonymTokenFilter(TokenStream input,
SynonymMap synonyms,
int maxSynonyms)
Creates an instance for the given underlying stream and synonym table.
|
Modifier and Type | Method and Description |
---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer)
A convenience method that tries a number of approaches to getting a token stream.
|
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field) |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv,
boolean tokenPositionsGuaranteedContiguous)
Low level api.
|
Modifier and Type | Method and Description |
---|---|
String |
Highlighter.getBestFragment(TokenStream tokenStream,
String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
String |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments,
String separator)
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
String text,
boolean mergeContiguousFragments,
int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.
|
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.