org.apache.lucene.analysis.ru
Class RussianLetterTokenizer

java.lang.Object
  extended byorg.apache.lucene.analysis.TokenStream
      extended byorg.apache.lucene.analysis.Tokenizer
          extended byorg.apache.lucene.analysis.CharTokenizer
              extended byorg.apache.lucene.analysis.ru.RussianLetterTokenizer

public class RussianLetterTokenizer
extends CharTokenizer

A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)

Version:
$Id: RussianLetterTokenizer.java 472959 2006-11-09 16:21:50Z yonik $
Author:
Boris Okner, b.okner@rogers.com

Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
RussianLetterTokenizer(Reader in, char[] charset)
           
 
Method Summary
protected  boolean isTokenChar(char c)
          Collects only characters which satisfy Character.isLetter(char).
 
Methods inherited from class org.apache.lucene.analysis.CharTokenizer
next, normalize
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RussianLetterTokenizer

public RussianLetterTokenizer(Reader in,
                              char[] charset)
Method Detail

isTokenChar

protected boolean isTokenChar(char c)
Collects only characters which satisfy Character.isLetter(char).

Specified by:
isTokenChar in class CharTokenizer


Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.