Package nltk_lite :: Package contrib :: Package mit :: Package six863 :: Package kimmo :: Module kimmo :: Class KimmoRuleSet
[hide private]
[frames] | no frames]

Class KimmoRuleSet

source code

     object --+    
              |    
yaml.YAMLObject --+
                  |
                 KimmoRuleSet

An object that represents the morphological rules for a language.

The KimmoRuleSet stores a list of rules which must all succeed when they process a given string. These rules can be used for generating a surface form from a lexical form, or recognizing a lexical form from a surface form.

Nested Classes [hide private]

Inherited from yaml.YAMLObject: __metaclass__, yaml_dumper, yaml_loader

Instance Methods [hide private]
 
__init__(self, subsets, defaults, rules, morphology=None, null='0', boundary='#')
Creates a KimmoRuleSet.
source code
 
rules(self)
The list of rules in this ruleset.
source code
 
subsets(self)
The dictionary defining subsets of characters of the language.
source code
 
is_subset(self, key)
Is this string a subset representing other strings?
source code
 
null(self)
The null symbol for this ruleset.
source code
 
morphology(self)
The morphological lexicon (as a KimmoMorphology).
source code
 
_pairtext(self, char) source code
 
_generate(self, pairs, state_list, morphology_state=None, word='', lexical=None, surface=None, features='', log=None) source code
 
generate(self, lexical, log=None)
Given a lexical form, return all possible surface forms that fit these rules.
source code
 
recognize(self, surface, log=None)
Given a surface form, return all possible lexical forms that fit these rules.
source code
 
_advance_rule(self, rule, state, pair) source code
 
_test_case(self, input, outputs, arrow, method) source code
 
batch_test(self, filename)
Test a rule set by reading lines from a file.
source code
 
gui(self, startTk=True) source code
 
draw_graphs(self, startTk=True) source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
 
from_yaml(cls, loader, node)
Loads a KimmoRuleSet from a parsed YAML node.
source code
 
load(cls, filename)
Loads a KimmoRuleSet from a YAML file.
source code
 
_from_yaml_dict(cls, map) source code

Inherited from yaml.YAMLObject: to_yaml

Class Variables [hide private]
  yaml_tag = '!KimmoRuleSet'

Inherited from yaml.YAMLObject: yaml_flow_style

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, subsets, defaults, rules, morphology=None, null='0', boundary='#')
(Constructor)

source code 

Creates a KimmoRuleSet. You may not want to do this directly, but use
KimmoRuleSet.load to load one from a YAML file.

A KimmoRuleSet takes these parameters:
subsets: a dictionary mapping strings to lists of strings. The strings
  in the map become subsets representing all of the strings in the
  list.
defaults: a list of KimmoPairs that can appear without being
  specifically mentioned by a rule.
rules: a list of KimmoFSARules or KimmoArrowRules that define the
  two-level morphology rules.
morphology: a KimmoMorphology object that defines a lexicon of word
  roots and affixes.
null: the symbol representing the empty string in rules.
boundary: the symbol that will always appear at the end of lexical and
  surface forms.

Overrides: object.__init__

morphology(self)

source code 

The morphological lexicon (as a KimmoMorphology). Could be None, if the ruleset is only used for generation.

generate(self, lexical, log=None)

source code 

Given a lexical form, return all possible surface forms that fit these rules.

Optionally, a 'log' object such as TextTrace(1) can be provided; this object will display to the user all the steps of the Kimmo algorithm.

recognize(self, surface, log=None)

source code 

Given a surface form, return all possible lexical forms that fit these rules. Because the components of a lexical form can include features such as the grammatical part of speech, each surface form is returned as a 2-tuple of (surface text, features).

Optionally, a 'log' object such as TextTrace(1) can be provided; this object will display to the user all the steps of the Kimmo algorithm.

batch_test(self, filename)

source code 

Test a rule set by reading lines from a file.

Each line contains one or more lexical forms on the left, and one or
more surface forms on the right (separated by commas if there are more
than one). In between, there is an arrow (=>, <=, or <=>), indicating
whether recognition, generation, or both should be tested. Comments
can be marked with ;.

Each form should produce the exact list of forms on the other side of
the arrow; if one is missing, or an extra one is produced, the test
will fail.

Examples of test lines:
  cat+s => cats             ; test generation only
  conoc+o <=> conozco       ; test generation and recognition
   <= conoco                ; this string should fail to be recognized

from_yaml(cls, loader, node)
Class Method

source code 

Loads a KimmoRuleSet from a parsed YAML node.

Overrides: yaml.YAMLObject.from_yaml

load(cls, filename)
Class Method

source code 

Loads a KimmoRuleSet from a YAML file.

The YAML file should contain a dictionary, with the following keys:
  lexicon: the filename of the lexicon to load.
  subsets: a dictionary mapping subset characters to space-separated
    lists of symbols. One of these should usually be '@', mapping
    to the entire alphabet.
  defaults: a space-separated list of KimmoPairs that should be allowed
    without a rule explicitly mentioning them.
  null: the symbol that will be used to represent 'null' (usually '0').
  boundary: the symbol that represents the end of the word
    (usually '#').
  rules: a dictionary mapping rule names to YAML representations of
    those rules.
  
A rule can take these forms:
* a dictionary of states, where each state is a dictionary mapping
  input pairs to following states. The start state is named 'start',
  the state named 'reject' instantly rejects, and state names can be
  prefixed with the word 'rejecting' so that they reject if the machine
  ends in that state.

  i-y-spelling: 
    start:
      'i:y': step1
      '@': start
    rejecting step1:
      'e:0': step2
      '@': reject
    rejecting step2:
      '+:0': step3
      '@': reject
    rejecting step3:
      'i:i': start
      '@': reject

  
* a block of text with a DFA table in it, of the form used by
  PC-KIMMO. The text should begin with a | so that YAML keeps your
  line breaks, and the next line should be 'FSA'. State 0 instantly
  rejects, and states with a period instead of a colon reject if the
  machine ends in that state.
  Examples:

  i-y-spelling: |        # this is the same rule as above
    FSA
        i  e  +  i      @
        y  0  0  i  @
    1:  2  1  1  1      1
    2.  0  3  0  0      0
    3.  0  0  4  0      0
    4.  0  0  0  1      0

  epenthesis: |
    FSA
       c h s Csib y + # 0 @
       c h s Csib i 0 # e @
    1: 2 1 4 3    3 1 1 0 1
    2: 2 3 3 3    3 1 1 0 1
    3: 2 1 3 3    3 5 1 0 1
    4: 2 3 3 3    3 5 1 0 1
    5: 2 1 2 2    2 1 1 6 1
    6. 0 0 7 0    0 0 0 0 0
    7. 0 0 0 0    0 1 1 0 0