Segmenters for chinese and japanese languages

Traditional chinese and japanese writing have no spaces between words in phrase as in western languages. Thus, while indexing documents in these languages, it's need additionaly to segment phrases into words.

Japanese language phrase segmenter

For japanes language phrase segmenting the ChaSen, a morphological system for japanes language is used. Thus, you need this system to be installed before mnoGoSearch's configuring and building.

To enable japanese language phrase segmenting use --enable-chasen switch for configure.

Chinese language phrase segmenter

For chinese language phrase segmenting the frequency dictionary of chinese words is used. And segmenting itself is done by dynamic programming method to maximize the cumulative frequency of produced words.

To enable chinese language phrase segmenting it's need to enable GB2312 charset support while mnoGoSearch configuring and specify frequency dictionary of chinese words by LoadChineseList in indexer.conf file.