[mmdasp] [Up] [mmdbeef] Demonstrations

mmdlabeltext
Segmenting letters, words and paragraphs.

Description

In this example, a digitized text is processed to identify the letters, words and paragraphs. This demonstration uses only the mmlabel function with different connectivity parameters.

Demo Script

Reading

The text image is read.

>>> f = mmreadgray('stext.tif')

                  
>>> mmshow(f)

                
f

First, label the letters.

The letters are the main connected components in the image. So we use the classical 8-connectivity criteria for identify each letter.

>>> fl=mmlabel(f,mmsebox())

                  
>>> mmlblshow(fl)

                
fl

Second, label the words.

The words are made of closed letters. In this case we use a connectivity specified by a rectangle structuring element of 7 pixels high and 11 pixels width, so any two pixels that can be hit by this rectangle, belong to the same connected component. The values 7 and 11 were chosen experimentally and depend on the font size.

>>> from Numeric import ones

                  
>>> sew = mmimg2se(mmbinary(ones((7,11))))

                  
>>> mmseshow(sew)
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],'1')
>>> fw=mmlabel(f,sew)

                  
>>> mmlblshow(fw)

                
fw

Finally, label the paragraphs.

Similarly, paragraphs are closed words. In this case the connectivity is given by a rectangle of 35 by 20 pixels.

>>> sep = mmimg2se(mmbinary(ones((20,35))))

                  
>>> fp=mmlabel(f,sep)

                  
>>> mmlblshow(fp)

                
fp

See also

mmlabel Label a binary image.
[mmdasp] [Up] [mmdbeef] Python