Class ThaiTokenizer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class ThaiTokenizer
    extends SegmentingTokenizerBase
    Tokenizer that use BreakIterator to tokenize Thai text.

    WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.

    • Field Detail

      • DBBI_AVAILABLE

        public static final boolean DBBI_AVAILABLE
        True if the JRE supports a working dictionary-based breakiterator for Thai. If this is false, this tokenizer will not work at all!
      • proto

        private static final java.text.BreakIterator proto
      • sentenceProto

        private static final java.text.BreakIterator sentenceProto
        used for breaking the text into sentences
      • wordBreaker

        private final java.text.BreakIterator wordBreaker
      • sentenceStart

        int sentenceStart
      • sentenceEnd

        int sentenceEnd
    • Constructor Detail

      • ThaiTokenizer

        public ThaiTokenizer()
        Creates a new ThaiTokenizer
      • ThaiTokenizer

        public ThaiTokenizer​(AttributeFactory factory)
        Creates a new ThaiTokenizer, supplying the AttributeFactory