Package nltk_lite :: Package contrib :: Package toolbox :: Module text
[hide private]
[frames] | no frames]

Module text

source code

This module provides tools for parsing and manipulating the contents of a Shoebox text without reference to its metadata.

Classes [hide private]
  Word
This class defines a word object, which consists of fixed number of attributes: a wordform, a gloss, a part of speech, and a list of morphemes.
  Morpheme
This class defines a morpheme object, which consists of fixed number of attributes: a surface form, an underlying form, a gloss, and a part of speech.
  Line
This class defines a line of interlinear glossing, such as:
  Paragraph
This class defines a unit of analysis above the line and below the text.
  Text
This class defines an interlinearized text, which consists of a collection of Paragraph objects.
Functions [hide private]
 
get_indices(str)
This method finds the indices for the leftmost boundaries of the units in a line of aligned text.
source code
 
get_slices_by_indices(str, indices)
Given a string and a list of indices, this function returns a list of the substrings defined by those indices.
source code
Function Details [hide private]

get_indices(str)

source code 

This method finds the indices for the leftmost boundaries of the units in a line of aligned text.

Given the field \um, this function will find the indices identifing leftmost word boundaries, as follows:

       0    5  8   12              <- indices
       |    |  |   |               
       |||||||||||||||||||||||||||
   \sf dit  is een goede           <- surface form
   \um dit  is een goed      -e    <- underlying morphemes
   \mg this is a   good      -ADJ  <- morpheme gloss
   \gc DEM  V  ART ADJECTIVE -SUFF <- grammatical categories
   t This is a good explanation. <- free translation

The function walks through the line char by char:

   c   flag.before  flag.after  index?
   --  -----------  ----------  ------
   0   1            0           yes
   1   0            1           no
   2   1            0           no
   3   0            1           no
   4   1            0           no   
   5   1            0           yes
Parameters:
  • str (string) - aligned text

get_slices_by_indices(str, indices)

source code 

Given a string and a list of indices, this function returns a list of the substrings defined by those indices. For example, given the arguments:

   str='antidisestablishmentarianism', indices=[4, 7, 16, 20, 25]

this function returns the list:

   ['anti', 'dis', 'establish', 'ment', arian', 'ism']
Parameters:
  • str (string) - text
  • indices (list of integers) - indices