Leptonica  1.83.1
Image processing and image analysis suite
recog.h File Reference

Go to the source code of this file.

Data Structures

struct  L_Recog
 
struct  L_Rch
 
struct  L_Rcha
 
struct  L_Rdid
 

Macros

#define RECOG_VERSION_NUMBER   2
 

Typedefs

typedef struct L_Recog L_RECOG
 
typedef struct L_Rch L_RCH
 
typedef struct L_Rcha L_RCHA
 
typedef struct L_Rdid L_RDID
 

Enumerations

enum  {
  L_UNKNOWN = 0 , L_ARABIC_NUMERALS = 1 , L_LC_ROMAN_NUMERALS = 2 , L_UC_ROMAN_NUMERALS = 3 ,
  L_LC_ALPHA = 4 , L_UC_ALPHA = 5
}
 
enum  { L_USE_ALL_TEMPLATES = 0 , L_USE_AVERAGE_TEMPLATES = 1 }
 

Detailed Description

    This is a simple utility for training and recognizing individual
    machine-printed text characters.  It is designed to be adapted
    to a particular set of character images; e.g., from a book.

    There are two methods of training the recognizer.  In the most
    simple, a set of bitmaps has been labeled by some means, such
    a generic OCR program.  This is input either one template at a time
    or as a pixa of templates, to a function that creates a recog.
    If in a pixa, the text string label must be embedded in the
    text field of each pix.

    If labeled data is not available, we start with a bootstrap
    recognizer (BSR) that has labeled data from a variety of sources.
    These images are scaled, typically to a fixed height, and then
    fed similarly scaled unlabeled images from the source (e.g., book),
    and the BSR attempts to identify them.  All images that have
    a high enough correlation score with one of the templates in the
    BSR are emitted in a pixa, which now holds unscaled and labeled
    templates from the source.  This is the generator for a book adapted
    recognizer (BAR).

    The pixa should always be thought of as the primary structure.
    It is the generator for the recog, because a recog is built
    from a pixa of unscaled images.

    New image templates can be added to a recog as long as it is
    in training mode.  Once training is finished, to add templates
    it is necessary to extract the generating pixa, add templates
    to that pixa, and make a new recog.  Similarly, we do not
    join two recog; instead, we simply join their generating pixa,
    and make a recog from that.

    To remove outliers from a pixa of labeled pix, make a recog,
    determine the outliers, and generate a new pixa with the
    outliers removed.  The outliers are determined by building
    special templates for each character set that are scaled averages
    of the individual templates.  Then a correlation score is found
    between each template and the averaged templates.  There are
    two implementations; outliers are determined as either:
     (1) a template having a correlation score with its class average
         that is below a threshold, or
     (2) a template having a correlation score with its class average
         that is smaller than the correlation score with the average
         of another class.
    Outliers are removed from the generating pixa.  Scaled averaging
    is only performed for determining outliers and for splitting
    characters; it is never used in a trained recognizer for identifying
    unlabeled samples.

    Two methods using averaged templates are provided for splitting
    touching characters:
     (1) greedy matching
     (2) document image decoding (DID)
    The DID method is the default.  It is about 5x faster and
    possibly more accurate.

    Once a BAR has been made, unlabeled sample images are identified
    by finding the individual template in the BAR with highest
    correlation.  The input images and images in the BAR can be
    represented in two ways:
     (1) as scanned, binarized to 1 bpp
     (2) as a width-normalized outline formed by thinning to a
         skeleton and then dilating by a fixed amount.

    The recog can be serialized to file and read back.  The serialized
    version holds the templates used for correlation (which may have
    been modified by scaling and turning into lines from the unscaled
    templates), plus, for arbitrary character sets, the UTF8
    representation and the lookup table mapping from the character
    representation to index.

    Why do we not use averaged templates for recognition?
    Letterforms can take on significantly different shapes (eg.,
    the letters 'a' and 'g'), and it makes no sense to average these.
    The previous version of this utility allowed multiple recognizers
    to exist, but this is an unnecessary complication if recognition
    is done on all samples instead of on averages.

Definition in file recog.h.

Enumeration Type Documentation

◆ anonymous enum

anonymous enum

Character Set

Enumerator
L_UNKNOWN 

character set type is not specified

L_ARABIC_NUMERALS 

10 digits

L_LC_ROMAN_NUMERALS 

7 lower-case letters (i,v,x,l,c,d,m)

L_UC_ROMAN_NUMERALS 

7 upper-case letters (I,V,X,L,C,D,M)

L_LC_ALPHA 

26 lower-case letters

L_UC_ALPHA 

26 upper-case letters

Definition at line 245 of file recog.h.

◆ anonymous enum

anonymous enum

Template Select

Enumerator
L_USE_ALL_TEMPLATES 

use all templates; default

L_USE_AVERAGE_TEMPLATES 

use average templates; special cases

Definition at line 259 of file recog.h.