Home | Trees | Indices | Help |
|
---|
|
object --+ | yaml.YAMLObject --+ | KimmoRuleSet
An object that represents the morphological rules for a language.
The KimmoRuleSet stores a list of rules which must all succeed when they process a given string. These rules can be used for generating a surface form from a lexical form, or recognizing a lexical form from a surface form.
|
|||
Inherited from |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Inherited from |
|
|||
|
|||
|
|||
|
|||
Inherited from |
|
|||
yaml_tag =
|
|||
Inherited from |
|
|||
|
Creates a KimmoRuleSet. You may not want to do this directly, but use KimmoRuleSet.load to load one from a YAML file. A KimmoRuleSet takes these parameters: subsets: a dictionary mapping strings to lists of strings. The strings in the map become subsets representing all of the strings in the list. defaults: a list of KimmoPairs that can appear without being specifically mentioned by a rule. rules: a list of KimmoFSARules or KimmoArrowRules that define the two-level morphology rules. morphology: a KimmoMorphology object that defines a lexicon of word roots and affixes. null: the symbol representing the empty string in rules. boundary: the symbol that will always appear at the end of lexical and surface forms.
|
The morphological lexicon (as a KimmoMorphology). Could be None, if the ruleset is only used for generation. |
Given a lexical form, return all possible surface forms that fit these rules. Optionally, a 'log' object such as TextTrace(1) can be provided; this object will display to the user all the steps of the Kimmo algorithm. |
Given a surface form, return all possible lexical forms that fit these rules. Because the components of a lexical form can include features such as the grammatical part of speech, each surface form is returned as a 2-tuple of (surface text, features). Optionally, a 'log' object such as TextTrace(1) can be provided; this object will display to the user all the steps of the Kimmo algorithm. |
Test a rule set by reading lines from a file. Each line contains one or more lexical forms on the left, and one or more surface forms on the right (separated by commas if there are more than one). In between, there is an arrow (=>, <=, or <=>), indicating whether recognition, generation, or both should be tested. Comments can be marked with ;. Each form should produce the exact list of forms on the other side of the arrow; if one is missing, or an extra one is produced, the test will fail. Examples of test lines: cat+s => cats ; test generation only conoc+o <=> conozco ; test generation and recognition <= conoco ; this string should fail to be recognized |
Loads a KimmoRuleSet from a parsed YAML node.
|
Loads a KimmoRuleSet from a YAML file. The YAML file should contain a dictionary, with the following keys: lexicon: the filename of the lexicon to load. subsets: a dictionary mapping subset characters to space-separated lists of symbols. One of these should usually be '@', mapping to the entire alphabet. defaults: a space-separated list of KimmoPairs that should be allowed without a rule explicitly mentioning them. null: the symbol that will be used to represent 'null' (usually '0'). boundary: the symbol that represents the end of the word (usually '#'). rules: a dictionary mapping rule names to YAML representations of those rules. A rule can take these forms: * a dictionary of states, where each state is a dictionary mapping input pairs to following states. The start state is named 'start', the state named 'reject' instantly rejects, and state names can be prefixed with the word 'rejecting' so that they reject if the machine ends in that state. i-y-spelling: start: 'i:y': step1 '@': start rejecting step1: 'e:0': step2 '@': reject rejecting step2: '+:0': step3 '@': reject rejecting step3: 'i:i': start '@': reject * a block of text with a DFA table in it, of the form used by PC-KIMMO. The text should begin with a | so that YAML keeps your line breaks, and the next line should be 'FSA'. State 0 instantly rejects, and states with a period instead of a colon reject if the machine ends in that state. Examples: i-y-spelling: | # this is the same rule as above FSA i e + i @ y 0 0 i @ 1: 2 1 1 1 1 2. 0 3 0 0 0 3. 0 0 4 0 0 4. 0 0 0 1 0 epenthesis: | FSA c h s Csib y + # 0 @ c h s Csib i 0 # e @ 1: 2 1 4 3 3 1 1 0 1 2: 2 3 3 3 3 1 1 0 1 3: 2 1 3 3 3 5 1 0 1 4: 2 3 3 3 3 5 1 0 1 5: 2 1 2 2 2 1 1 6 1 6. 0 0 7 0 0 0 0 0 0 7. 0 0 0 0 0 1 1 0 0 |
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Wed May 16 22:47:24 2007 | http://epydoc.sourceforge.net |