Package nltk_lite :: Package tag
[hide private]
[frames] | no frames]

Package tag

source code

Classes and interfaces for tagging each token of a document with supplementary information, such as its part of speech or its WordNet synset tag. This task, which is known as tagging, is defined by the TagI interface.

Submodules [hide private]
  • nltk_lite.tag.brill: Brill's transformational rule-based tagger.
  • nltk_lite.tag.hmm: Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence.
  • nltk_lite.tag.ngram: Classes and interfaces for tagging each token of a document with supplementary information, such as its part of speech or its WordNet synset tag.
  • nltk_lite.tag.unigram: Classes and interfaces for tagging each token of a document with supplementary information, such as its part of speech or its WordNet synset tag.

Classes [hide private]
  Default
A tagger that assigns the same tag to every token.
  SequentialBackoff
A tagger that tags words sequentially, left to right.
  TagI
A processing interface for assigning a tag to each token in a list.
Functions [hide private]
 
tag2tuple(s, sep='/') source code
float
accuracy(tagger, gold)
Score the accuracy of the tagger against the gold standard.
source code
 
tags2string(t, sep='/') source code
 
untag(tagged_sentence) source code
 
string2words(s, sep='/') source code
 
string2tags(s, sep='/') source code
Function Details [hide private]

accuracy(tagger, gold)

source code 

Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.

Parameters:
  • tagger (TagI) - The tagger being evaluated.
  • gold (list of Token) - The list of tagged tokens to score the tagger on.
Returns: float