Package nltk_lite :: Package tag :: Module unigram :: Class Affix
[show private | hide private]
[frames | no frames]

Type Affix

   object --+        
            |        
         TagI --+    
                |    
SequentialBackoff --+
                    |
                   Affix

Known Subclasses:
MarshalAffix

A unigram tagger that assign tags to tokens based on leading or trailing substrings (it is important to note that the substrings are not necessarily "true" morphological affixes). Before tag.Affix can be used, it should be trained on a tagged corpus. Using this training data, it will find the most likely tag for each word type. It will then use this information to assign the most frequent tag to each word. If the tag.Affix encounters a prefix or suffix in a word for which it has no data, it will assign the tag None.
Method Summary
  __init__(self, length, minlength, cutoff, backoff)
Construct a new affix stochastic tagger.
  __repr__(self)
  size(self)
  tag_one(self, token, history)
  train(self, tagged_corpus, verbose)
Train tag.Affix using the given training data.
Inherited from SequentialBackoff: tag, tag_sents
Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Method Details

__init__(self, length, minlength, cutoff=1, backoff=None)
(Constructor)

Construct a new affix stochastic tagger. The new tagger should be trained, using the train() method, before it is used to tag data.
Parameters:
length - The length of the affix to be considered during training and tagging (negative for suffixes)
           (type=number)
minlength - The minimum length for a word to be considered during training and tagging. It must be longer that length.
           (type=number)
Overrides:
__builtin__.object.__init__

train(self, tagged_corpus, verbose=False)

Train tag.Affix using the given training data. If this method is called multiple times, then the training data will be combined.
Parameters:
tagged_corpus - A tagged corpus. Each item should be a list of tagged tokens, where each consists of text and a tag.
           (type=list or iter(list))

Generated by Epydoc 2.1 on Tue Sep 5 09:37:21 2006 http://epydoc.sf.net