Package nltk_lite :: Package tag :: Module ngram :: Class Ngram
[show private | hide private]
[frames | no frames]

Type Ngram

   object --+        
            |        
         TagI --+    
                |    
SequentialBackoff --+
                    |
                   Ngram

Known Subclasses:
Bigram, MarshalNgram, Trigram

An n-gram stochastic tagger. Before an tagger.Ngram can be used, it should be trained on a tagged corpus. Using this training data, it will construct a frequency distribution describing the frequencies with each word is tagged in different contexts. The context considered consists of the word to be tagged and the n-1 previous words' tags. Once the tagger has been trained, it uses this frequency distribution to tag words by assigning each word the tag with the maximum frequency given its context. If the tagger.Ngram encounters a word in a context for which it has no data, it will assign it the tag None.
Method Summary
  __init__(self, n, cutoff, backoff)
Construct an n-gram stochastic tagger.
  __repr__(self)
  size(self)
  tag_one(self, token, history)
  train(self, tagged_corpus, verbose)
Train this tagger.Ngram using the given training data.
Inherited from SequentialBackoff: tag, tag_sents
Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Method Details

__init__(self, n, cutoff=1, backoff=None)
(Constructor)

Construct an n-gram stochastic tagger. The tagger must be trained using the train() method before being used to tag data.
Parameters:
n - The order of the new tagger.Ngram.
           (type=int)
cutoff - A count-cutoff for the tagger's frequency distribution. If the tagger saw fewer than cutoff examples of a given context in training, then it will return a tag of None for that context.
           (type=int)
Overrides:
__builtin__.object.__init__

train(self, tagged_corpus, verbose=False)

Train this tagger.Ngram using the given training data.
Parameters:
tagged_corpus - A tagged corpus. Each item should be a list of tagged tokens, where each consists of text and a tag.
           (type=list or iter(list))

Generated by Epydoc 2.1 on Tue Sep 5 09:37:21 2006 http://epydoc.sf.net