Package nltk_lite :: Package contrib :: Module marshalbrill
[show private | hide private]
[frames | no frames]

Module nltk_lite.contrib.marshalbrill

Brill's transformational rule-based tagger.
Classes
Brill Brill's transformational rule-based tagger.
BrillRuleI An interface for tag transformations on a tagged corpus, as performed by brill taggers.
BrillTemplateI An interface for generating lists of transformational rules that apply at given corpus positions.
BrillTrainer A trainer for brill taggers.
FastBrillTrainer A faster trainer for brill taggers.
ProximateTagsRule A rule which examines the tags of nearby tokens.
ProximateTokensRule An abstract base class for brill rules whose condition checks for the presence of tokens with given properties at given ranges of positions, relative to the token.
ProximateTokensTemplate An brill templates that generates a list of ProximateTokensRules that apply at a given corpus position.
ProximateWordsRule A rule which examines the base types of nearby tokens.
SymmetricProximateTokensTemplate Simulates two ProximateTokensTemplates which are symmetric across the location of the token.

Function Summary
  demo(num_sents, max_rules, min_score, error_output, rule_output, randomize, train, trace)
Brill Tagger Demonstration
  errorList(train_tokens, tokens, radius)
Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.
  _errorPositions(train_tokens, tokens)

Function Details

demo(num_sents=100, max_rules=200, min_score=2, error_output='errors.out', rule_output='rules.out', randomize=False, train=0.80000000000000004, trace=3)

Brill Tagger Demonstration
Parameters:
num_sents - how many sentences of training and testing data to use
           (type=int)
max_rules - maximum number of rule instances to create
           (type=int)
min_score - the minimum score for a rule in order for it to be considered
           (type=int)
error_output - the file where errors will be saved
           (type=string)
rule_output - the file where rules will be saved
           (type=string)
randomize - whether the training data should be a random subset of the corpus
           (type=boolean)
train - the fraction of the the corpus to be used for training (1=all)
           (type=int)
trace - the level of diagnostic tracing output to produce (0-3)

errorList(train_tokens, tokens, radius=2)

Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.
Parameters:
train_tokens - The correct tagging of the corpus
           (type=list of tuple)
tokens - The tagged corpus
           (type=list of tuple)
radius - How many tokens on either side of a wrongly-tagged token to include in the error string. For example, if radius=2, each error string will show the incorrect token plus two tokens on either side.
           (type=int)

Generated by Epydoc 2.1 on Tue Sep 5 09:37:21 2006 http://epydoc.sf.net