

                Stu's Optical Character Recogniser
    
                             S O C R


                           -----------
                

             (it's pronounced "soccer" in a kiwi accent)


Introduction
------------

This is the first release of what will hopefully become a fully
blown OCR system.  This release contains:

1. A few example training characters
2. A Naive Bayes classifier
3. The CFS (correlation feature selection) method for
   reducing the number of attributes
4. Routines for reading and writing .arff files
   (used for the machine learning methods)
5. Lots of code for doing lots of things

6. Granted that this version is pretty sad, but we have to start
   somewhere! 

License
-------

All this code is being released under the GNU public license. If you
contribute code, note that it will continue to be licensed the same
way as the original SOCR code.

How it all works
----------------
There is a character format that is stored in plain ascii,
	zcat data/english/marks/demo1.marks.gz
to see it.

This is converted to the .arff machine learning standard input
file by bin/marks_to_arff
	ie. marks_to_arff <demo1.mark >demo1.arff

There are a couple of arff manipulation programs. arffinfo, arffcols
and arffsplit. Use -h for help.
	arffsplit -p 50 <demo1.arff train test

creates two files with a 50/50 split.

nbayes is a Naive Bayes classifier. nbayes -h for help.
	nbayes -t train -T test

Usage
-----

What can you do with this first version?

1. 
	make demo

2.
	Join the mailing list and talk to everyone!

	send a "subscribe ocr" message to 
		majordomo@icemark.ch

	then post a message to:
		ocr@icemark.ch

cheers
Stuart Inglis (singlis@cs.waikato.ac.nz)

