org.apache.lucene.wikipedia.analysis
Class WikipediaTokenizer
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.wikipedia.analysis.WikipediaTokenizer
public class WikipediaTokenizer
- extends Tokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. It is based off of the
Wikipedia tutorial available at http://en.wikipedia.org/wiki/Wikipedia:Tutorial, but it may not be complete.
EXPERIMENTAL !!!!!!!!!
NOTE: This Tokenizer is considered experimental and the grammar is subject to change in the trunk and in follow up releases.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INTERNAL_LINK
public static final java.lang.String INTERNAL_LINK
- See Also:
- Constant Field Values
EXTERNAL_LINK
public static final java.lang.String EXTERNAL_LINK
- See Also:
- Constant Field Values
EXTERNAL_LINK_URL
public static final java.lang.String EXTERNAL_LINK_URL
- See Also:
- Constant Field Values
CITATION
public static final java.lang.String CITATION
- See Also:
- Constant Field Values
CATEGORY
public static final java.lang.String CATEGORY
- See Also:
- Constant Field Values
BOLD
public static final java.lang.String BOLD
- See Also:
- Constant Field Values
ITALICS
public static final java.lang.String ITALICS
- See Also:
- Constant Field Values
BOLD_ITALICS
public static final java.lang.String BOLD_ITALICS
- See Also:
- Constant Field Values
HEADING
public static final java.lang.String HEADING
- See Also:
- Constant Field Values
SUB_HEADING
public static final java.lang.String SUB_HEADING
- See Also:
- Constant Field Values
WikipediaTokenizer
public WikipediaTokenizer(java.io.Reader input)
- Creates a new instance of the
WikipediaTokenizer
. Attaches the
input
to a newly created JFlex scanner.
- Parameters:
input
- The Input Reader
next
public Token next(Token result)
throws java.io.IOException
- Overrides:
next
in class TokenStream
- Throws:
java.io.IOException
reset
public void reset()
throws java.io.IOException
- Overrides:
reset
in class TokenStream
- Throws:
java.io.IOException
reset
public void reset(java.io.Reader reader)
throws java.io.IOException
- Overrides:
reset
in class Tokenizer
- Throws:
java.io.IOException
Copyright © 2000-2009 Apache Software Foundation. All Rights Reserved.