Downloading
tokenizer
Caution: you need lexed >= 4.3.3
to build.
To install under unix type
./configure [--prefix=<directory>] [--with-amalgam]
[--with-composition] (./configure --help for the help)
make
make install
make clean
tokenizer -h
lexed [ -d <directory > ] [ -p <filename prefix> ]
<lexicon1> <lexicon2> ...
The lexicons contain for every line the word followed by associated
information, separated by a character (tabulation or space by
default).
"." is default directory.
"lexicon" is default filename prefix.
You have to edit tokenizer.ll
and rebuild.
tokenizer [ -d <directory> ] [ -p <filename prefix> ] [ --encode <encoding> ] < inputfile > outputfile