Package nltk_lite :: Package contrib :: Package fst :: Module fst
[hide private]
[frames] | no frames]

Source Code for Module nltk_lite.contrib.fst.fst

   1  """ 
   2  Finite state transducers. 
   3   
   4  A finite state trasducer, or FST, is a directed graph that is used to 
   5  encode a mapping from a set of I{input strings} to a set of I{output 
   6  strings}.  An X{input string} is a sequence of immutable values (such 
   7  as integers, characters, or strings) called X{input symbols}. 
   8  Similarly, an C{output string} is a sequence of immutable values 
   9  called X{output symbols}.  Collectively, input strings and output 
  10  strings are called X{symbol strings}, or simply X{strings} for short. 
  11  Note that this notion of I{string} is different from the python string 
  12  type -- symbol strings are always encoded as tuples of input or output 
  13  symbols, even if those symbols are characters.  Also, note that empty 
  14  sequences are valid symbol strings. 
  15   
  16  The nodes of an FST are called X{states}, and the edges are called 
  17  X{transition arcs} or simply X{arcs}.  States may be marked as 
  18  X{final}, and each final state is annotated with an output string, 
  19  called the X{finalizing string}.  Each arc is annotated with an input 
  20  string and an output string.  An arc with an empty input string is 
  21  called an I{epsilon-input arc}; and an arc with an empty output 
  22  string is called an I{epsilon-output arc}. 
  23   
  24  The set of mappings encoded by the FST are defined by the set of paths 
  25  through the graph, starting at a special state known as the X{initial 
  26  state}, and ending at a final state.  In particular, the FST maps an 
  27  input string X to an output string Y iff there exists a path from the 
  28  initial state to a final state such that: 
  29   
  30    - The input string X is formed by concatenating the input strings 
  31      of the arcs along the path (in order). 
  32    - The output string Y is formed by concatenating the output strings 
  33      of the arcs along the path (in order), plus the final state's 
  34      output string. 
  35   
  36  The following list defines some terms that apply to finite state 
  37  transducers. 
  38   
  39    - The X{transduction} defined by a FST is the mapping from input 
  40      strings to output strings. 
  41       
  42    - An FST X{encodes a deterministic transduction} if each input 
  43      string maps to at most one output string.  An FST X{encodes a 
  44      nondeterministic transduction} if any input string maps to more 
  45      than one output string. 
  46   
  47    - An FST is X{deterministic} if it every state contains at most one 
  48      outgoing arc that is consistent with any input string; otherwise, 
  49      the FST is X{nondeterministic}.  If an FST is deterministic, then 
  50      it necessarily encodes a deterministic transduction; however, it 
  51      is possible to define an FST that is nondeterministic but that 
  52      encodes a deterministic transduction. 
  53   
  54    - An FST is X{sequential} if each arc is labeled with exactly one 
  55      input symbol, no two outgoing arcs from any state have the same 
  56      input symbol, and all finalizing strings are empty.  (Sequential 
  57      implies deterministic). 
  58   
  59    - An FST is I{subsequential} if each arc is labeled with exactly 
  60      one input symbol, and no two outgoing arcs from any state have 
  61      the same input symbol.  (Finalizing strings may be non-empty.) 
  62   
  63  An FSA can be represented as an FST that generates no output symbols. 
  64   
  65  The current FST class does not provide support for: 
  66   
  67    - Weighted arcs.  (However, weights can be used as, or included 
  68      in, the output symbols.  The total weight of a path can then 
  69      be found after transduction by combining the weights.  But 
  70      there's no support for e.g., finding the path with the minimum 
  71      weight. 
  72       
  73    - Multiple initial states. 
  74     
  75    - Initializing strings (an output string associated with the initial 
  76      state, which is always generated when the FST begins). 
  77   
  78  Possible future changes: 
  79   
  80    - Define several classes, in a class hierarchy?  E.g., FSA is a base 
  81      class, FST inherits from it.  And maybe a further subclass to add 
  82      finalizing sequences.  I would need to be more careful to only 
  83      access the private variables when necessary, and to usually go 
  84      through the accessor functions. 
  85  """ 
  86   
  87  import re, os, random, tempfile 
  88  from subprocess import Popen, PIPE 
  89  from nltk_lite.draw import * 
  90  from nltk_lite.contrib.fst.draw_graph import * 
  91   
  92  ###################################################################### 
  93  # CONTENTS 
  94  ###################################################################### 
  95  # 1. Finite State Transducer 
  96  #    - State information 
  97  #    - Transition Arc Information 
  98  #    - FST Information 
  99  #    - State Modification 
 100  #    - Transition Arc Modification 
 101  #    - Transformations 
 102  #    - Misc 
 103  #    - Transduction 
 104  # 2. AT&T fsmtools support 
 105  # 3. Graphical Display 
 106  #    - FSTDisplay 
 107  #    - FSTDemo 
 108  ###################################################################### 
 109   
 110  ###################################################################### 
 111  #{ Finite State Transducer 
 112  ###################################################################### 
 113   
114 -class FST(object):
115 """ 116 A finite state transducer. Each state is uniquely identified by a 117 label, which is typically a string name or an integer id. A 118 state's label is used to access and modify the state. Similarly, 119 each arc is uniquely identified by a label, which is used to 120 access and modify the arc. 121 122 The set of arcs pointing away from a state are that state's 123 I{outgoing} arcs. The set of arcs pointing to a state are that 124 state's I{incoming} arcs. The state at which an arc originates is 125 that arc's I{source} state (or C{src}), and the state at which it 126 terminates is its I{destination} state (or C{dst}). 127 128 It is possible to define an C{FST} object with no initial state. 129 This is represented by assigning a value of C{None} to the 130 C{initial_state} variable. C{FST}s with no initial state are 131 considered to encode an empty mapping. I.e., transducing any 132 string with such an C{FST} will result in failure. 133 """
134 - def __init__(self, label):
135 """ 136 Create a new finite state transducer, containing no states. 137 """ 138 self.label = label 139 """A label identifying this FST. This is used for display & 140 debugging purposes only.""" 141 142 #{ State Information 143 self._initial_state = None 144 """The label of the initial state, or C{None} if this FST 145 does not have an initial state.""" 146 147 self._incoming = {} 148 """A dictionary mapping state labels to lists of incoming 149 transition arc labels.""" 150 151 self._outgoing = {} 152 """A dictionary mapping state labels to lists of outgoing 153 transition arc labels.""" 154 155 self._is_final = {} 156 """A dictionary mapping state labels to boolean values, 157 indicating whether the state is final.""" 158 159 self._finalizing_string = {} 160 """A dictionary mapping state labels of final states to output 161 strings. This string should be added to the output 162 if the FST terminates at this state.""" 163 164 self._state_descr = {} 165 """A dictionary mapping state labels to (optional) state 166 descriptions.""" 167 #} 168 169 #{ Transition Arc Information 170 self._src = {} 171 """A dictionary mapping each transition arc label to the label of 172 its source state.""" 173 174 self._dst = {} 175 """A dictionary mapping each transition arc label to the label of 176 its destination state.""" 177 178 self._in_string = {} 179 """A dictionary mapping each transition arc label to its input 180 string, a (possibly empty) tuple of input symbols.""" 181 182 self._out_string = {} 183 """A dictionary mapping each transition arc label to its output 184 string, a (possibly empty) tuple of input symbols.""" 185 186 self._arc_descr = {} 187 """A dictionary mapping transition arc labels to (optional) 188 arc descriptions."""
189 #} 190 191 #//////////////////////////////////////////////////////////// 192 #{ State Information 193 #//////////////////////////////////////////////////////////// 194
195 - def states(self):
196 """Return an iterator that will generate the state label of 197 each state in this FST.""" 198 return iter(self._incoming)
199
200 - def has_state(self, label):
201 """Return true if this FST contains a state with the given 202 label.""" 203 return label in self._incoming
204
205 - def _get_initial_state(self):
206 return self._initial_state
207 - def _set_initial_state(self, label):
208 if label is not None and label not in self._incoming: 209 raise ValueError('Unknown state label %r' % label) 210 self._initial_state = label
211 initial_state = property(_get_initial_state, _set_initial_state, 212 doc="The label of the initial state (R/W).") 213
214 - def incoming(self, state):
215 """Return an iterator that will generate the incoming 216 transition arcs for the given state. The effects of modifying 217 the FST's state while iterating are undefined, so if you plan 218 to modify the state, you should copy the incoming transition 219 arcs into a list first.""" 220 return iter(self._incoming[state])
221
222 - def outgoing(self, state):
223 """Return an iterator that will generate the outgoing 224 transition arcs for the given state. The effects of modifying 225 the FST's state while iterating are undefined, so if you plan 226 to modify the state, you should copy the outgoing transition 227 arcs into a list first.""" 228 return iter(self._outgoing[state])
229
230 - def is_final(self, state):
231 """Return true if the state with the given state label is 232 final.""" 233 return self._is_final[state]
234
235 - def finalizing_string(self, state):
236 """Return the output string associated with the given final 237 state. If the FST terminates at this state, then this string 238 will be emitted.""" 239 #if not self._is_final[state]: 240 # raise ValueError('%s is not a final state' % state) 241 return self._finalizing_string.get(state, ())
242
243 - def state_descr(self, state):
244 """Return the description for the given state, if it has one; 245 or None, otherwise.""" 246 return self._state_descr.get(state)
247 248 #//////////////////////////////////////////////////////////// 249 #{ Transition Arc Information 250 #//////////////////////////////////////////////////////////// 251
252 - def arcs(self):
253 """Return an iterator that will generate the arc label of 254 each transition arc in this FST.""" 255 return iter(self._src)
256
257 - def src(self, arc):
258 """Return the state label of this transition arc's source 259 state.""" 260 return self._src[arc]
261
262 - def dst(self, arc):
263 """Return the state label of this transition arc's destination 264 state.""" 265 return self._dst[arc]
266
267 - def in_string(self, arc):
268 """Return the given transition arc's input string, a (possibly 269 empty) tuple of input symbols.""" 270 return self._in_string[arc]
271
272 - def out_string(self, arc):
273 """Return the given transition arc's output string, a 274 (possibly empty) tuple of output symbols.""" 275 return self._out_string[arc]
276
277 - def arc_descr(self, arc):
278 """Return the description for the given transition arc, if it 279 has one; or None, otherwise.""" 280 return self._arc_descr.get(arc)
281
282 - def arc_info(self, arc):
283 """Return a tuple (src, dst, in_string, out_string) for the 284 given arc, where: 285 - C{src} is the label of the arc's source state. 286 - C{dst} is the label of the arc's destination state. 287 - C{in_string} is the arc's input string. 288 - C{out_string} is the arc's output string. 289 """ 290 return (self._src[arc], self._dst[arc], 291 self._in_string[arc], self._out_string[arc])
292 293 #//////////////////////////////////////////////////////////// 294 #{ FST Information 295 #//////////////////////////////////////////////////////////// 296
297 - def is_sequential(self):
298 """ 299 Return true if this FST is sequential. 300 """ 301 for state in self.states(): 302 if self.finalizing_string(state): return False 303 return self.is_subsequential()
304
305 - def is_subsequential(self):
306 """ 307 Return true if this FST is subsequential. 308 """ 309 for state in self.states(): 310 out_syms = set() 311 for arc in self.outgoing(state): 312 out_string = self.out_string(arc) 313 if len(out_string) != 1: return False 314 if out_string[0] in out_syms: return False 315 out_syms.add(out_string) 316 return True
317 318 #//////////////////////////////////////////////////////////// 319 #{ State Modification 320 #//////////////////////////////////////////////////////////// 321
322 - def add_state(self, label=None, is_final=False, 323 finalizing_string=(), descr=None):
324 """ 325 Create a new state, and return its label. The new state will 326 have no incoming or outgoing arcs. If C{label} is specified, 327 then it will be used as the state's label; otherwise, a new 328 unique label value will be chosen. The new state will be 329 final iff C{is_final} is true. C{descr} is an optional 330 description string for the new state. 331 332 Arguments should be specified using keywords! 333 """ 334 label = self._pick_label(label, 'state', self._incoming) 335 336 # Add the state. 337 self._incoming[label] = [] 338 self._outgoing[label] = [] 339 self._is_final[label] = is_final 340 self._state_descr[label] = descr 341 self._finalizing_string[label] = tuple(finalizing_string) 342 343 # Return the new state's label. 344 return label
345
346 - def del_state(self, label):
347 """ 348 Delete the state with the given label. This will 349 automatically delete any incoming or outgoing arcs attached to 350 the state. 351 """ 352 if label not in self._incoming: 353 raise ValueError('Unknown state label %r' % label) 354 355 # Delete the incoming/outgoing arcs. 356 for arc in self._incoming[label]: 357 del (self._src[arc], self._dst[arc], self._in_string[arc], 358 self._out_string[arc], self._arc_descr[arc]) 359 for arc in self._outgoing[label]: 360 del (self._src[arc], self._dst[arc], self._in_string[arc], 361 self._out_string[arc], self._arc_descr[arc]) 362 363 # Delete the state itself. 364 del (self._incoming[label], self._otugoing[label], 365 self._is_final[label], self._state_descr[label], 366 self._finalizing_string[label]) 367 368 # Check if we just deleted the initial state. 369 if label == self._initial_state: 370 self._initial_state = None
371
372 - def set_final(self, state, is_final=True):
373 """ 374 If C{is_final} is true, then make the state with the given 375 label final; if C{is_final} is false, then make the state with 376 the given label non-final. 377 """ 378 if state not in self._incoming: 379 raise ValueError('Unknown state label %r' % state) 380 self._is_final[state] = is_final
381
382 - def set_finalizing_string(self, state, finalizing_string):
383 """ 384 Set the given state's finalizing string. 385 """ 386 if not self._is_final[state]: 387 raise ValueError('%s is not a final state' % state) 388 if state not in self._incoming: 389 raise ValueError('Unknown state label %r' % state) 390 self._finalizing_string[state] = tuple(finalizing_string)
391
392 - def set_descr(self, state, descr):
393 """ 394 Set the given state's description string. 395 """ 396 if state not in self._incoming: 397 raise ValueError('Unknown state label %r' % state) 398 self._state_descr[state] = descr
399
400 - def dup_state(self, orig_state, label=None):
401 """ 402 Duplicate an existing state. I.e., create a new state M{s} 403 such that: 404 - M{s} is final iff C{orig_state} is final. 405 - If C{orig_state} is final, then M{s.finalizing_string} 406 is copied from C{orig_state} 407 - For each outgoing arc from C{orig_state}, M{s} has an 408 outgoing arc with the same input string, output 409 string, and destination state. 410 411 Note that if C{orig_state} contained self-loop arcs, then the 412 corresponding arcs in M{s} will point to C{orig_state} (i.e., 413 they will I{not} be self-loop arcs). 414 415 The state description is I{not} copied. 416 417 @param label: The label for the new state. If not specified, 418 a unique integer will be used. 419 """ 420 if orig_state not in self._incoming: 421 raise ValueError('Unknown state label %r' % src) 422 423 # Create a new state. 424 new_state = self.add_state(label=label) 425 426 # Copy finalization info. 427 if self.is_final(orig_state): 428 self.set_final(new_state) 429 self.set_finalizing_string(new_state, 430 self.finalizing_string(orig_state)) 431 432 # Copy the outgoing arcs. 433 for arc in self._outgoing[orig_state]: 434 self.add_arc(src=new_state, dst=self._dst[arc], 435 in_string=self._in_string[arc], 436 out_string=self._out_string[arc]) 437 438 return new_state
439 440 #//////////////////////////////////////////////////////////// 441 #{ Transition Arc Modification 442 #//////////////////////////////////////////////////////////// 443
444 - def add_arc(self, src, dst, in_string, out_string, 445 label=None, descr=None):
446 """ 447 Create a new transition arc, and return its label. 448 449 Arguments should be specified using keywords! 450 451 @param src: The label of the source state. 452 @param dst: The label of the destination state. 453 @param in_string: The input string, a (possibly empty) tuple of 454 input symbols. Input symbols should be hashable 455 immutable objects. 456 @param out_string: The output string, a (possibly empty) tuple 457 of output symbols. Output symbols should be hashable 458 immutable objects. 459 """ 460 label = self._pick_label(label, 'arc', self._src) 461 462 # Check that src/dst are valid labels. 463 if src not in self._incoming: 464 raise ValueError('Unknown state label %r' % src) 465 if dst not in self._incoming: 466 raise ValueError('Unknown state label %r' % dst) 467 468 # Add the arc. 469 self._src[label] = src 470 self._dst[label] = dst 471 self._in_string[label] = tuple(in_string) 472 self._out_string[label] = tuple(out_string) 473 self._arc_descr[label] = descr 474 475 # Link the arc to its src/dst states. 476 self._incoming[dst].append(label) 477 self._outgoing[src].append(label) 478 479 # Return the new arc's label. 480 return label
481
482 - def del_arc(self, label):
483 """ 484 Delete the transition arc with the given label. 485 """ 486 if label not in self._src: 487 raise ValueError('Unknown arc label %r' % src) 488 489 # Disconnect the arc from its src/dst states. 490 self._incoming[self._dst[label]].remove(label) 491 self._outgoing[self._src[label]].remove(label) 492 493 # Delete the arc itself. 494 del (self._src[label], self._dst[label], self._in_string[label], 495 self._out_string[label], self._arc_descr[label])
496 497 #//////////////////////////////////////////////////////////// 498 #{ Transformations 499 #//////////////////////////////////////////////////////////// 500
501 - def inverted(self):
502 """Swap all in_string/out_string pairs.""" 503 fst = self.copy() 504 fst._in_string, fst._out_string = fst._out_string, fst._in_string 505 return fst
506
507 - def reversed(self):
508 """Reverse the direction of all transition arcs.""" 509 fst = self.copy() 510 fst._incoming, fst._outgoing = fst._outgoing, fst._incoming 511 fst._src, fst._dst = fst._dst, fst._src 512 return fst
513
514 - def trimmed(self):
515 fst = self.copy() 516 517 if fst.initial_state is None: 518 raise ValueError("No initial state!") 519 520 # Determine whether there is a path from the initial node to 521 # each node. 522 queue = [fst.initial_state] 523 path_from_init = set(queue) 524 while queue: 525 state = queue.pop() 526 dsts = [fst.dst(arc) for arc in fst.outgoing(state)] 527 queue += [s for s in dsts if s not in path_from_init] 528 path_from_init.update(dsts) 529 530 # Determine whether there is a path from each node to a final 531 # node. 532 queue = [s for s in fst.states() if fst.is_final(s)] 533 path_to_final = set(queue) 534 while queue: 535 state = queue.pop() 536 srcs = [fst.src(arc) for arc in fst.incoming(state)] 537 queue += [s for s in srcs if s not in path_to_final] 538 path_to_final.update(srcs) 539 540 # Delete anything that's not on a path from the initial state 541 # to a final state. 542 for state in list(fst.states()): 543 if not (state in path_from_init and state in path_to_final): 544 fst.del_state(state) 545 546 return fst
547
548 - def relabeled(self, label=None, relabel_states=True, relabel_arcs=True):
549 """ 550 Return a new FST that is identical to this FST, except that 551 all state and arc labels have been replaced with new labels. 552 These new labels are consecutive integers, starting with zero. 553 554 @param relabel_states: If false, then don't relabel the states. 555 @param relabel_arcs: If false, then don't relabel the arcs. 556 """ 557 if label is None: label = '%s (relabeled)' % self.label 558 fst = FST(label) 559 560 # This will ensure that the state relabelling is canonical, *if* 561 # the FST is subsequential. 562 state_ids = self._relabel_state_ids(self.initial_state, {}) 563 if len(state_ids) < len(self._outgoing): 564 for state in self.states(): 565 if state not in state_ids: 566 state_ids[state] = len(state_ids) 567 568 # This will ensure that the arc relabelling is canonical, *if* 569 # the state labelling is canonical. 570 arcs = sorted(self.arcs(), key=self.arc_info) 571 arc_ids = dict([(a,i) for (i,a) in enumerate(arcs)]) 572 573 for state in self.states(): 574 if relabel_states: label = state_ids[state] 575 else: label = state 576 fst.add_state(label, is_final=self.is_final(state), 577 finalizing_string=self.finalizing_string(state), 578 descr=self.state_descr(state)) 579 580 for arc in self.arcs(): 581 if relabel_arcs: label = arc_ids[arc] 582 else: label = arc 583 src, dst, in_string, out_string = self.arc_info(arc) 584 if relabel_states: 585 src = state_ids[src] 586 dst = state_ids[dst] 587 fst.add_arc(src=src, dst=dst, in_string=in_string, 588 out_string=out_string, 589 label=label, descr=self.arc_descr(arc)) 590 591 if relabel_states: 592 fst.initial_state = state_ids[self.initial_state] 593 else: 594 fst.initial_state = self.initial_state 595 596 return fst
597
598 - def _relabel_state_ids(self, state, ids):
599 """ 600 A helper function for L{relabel()}, which decides which new 601 label should be assigned to each state. 602 """ 603 if state in ids: return 604 ids[state] = len(ids) 605 for arc in sorted(self.outgoing(state), 606 key = lambda a:self.in_string(a)): 607 self._relabel_state_ids(self.dst(arc), ids) 608 return ids
609
610 - def determinized(self, label=None):
611 """ 612 Return a new FST which defines the same mapping as this FST, 613 but is determinized. 614 615 The algorithm used is based on [...]. 616 617 @require: All arcs in this FST must have exactly one input 618 symbol. 619 @require: The mapping defined by this FST must be 620 deterministic. 621 @raise ValueError: If the determinization algorithm was unable 622 to determinize this FST. Typically, this happens because 623 a precondition is not met. 624 """ 625 # Check preconditions.. 626 for arc in self.arcs(): 627 if len(self.in_string(arc)) != 1: 628 raise ValueError("All arcs must have exactly one " 629 "input symbol.") 630 631 # State labels have the form: 632 # frozenset((s1,w1),(s2,w2),...(sn,wn)) 633 # Where si is a state and wi is a string of output symbols. 634 if label is None: label = '%s (determinized)' % self.label 635 new_fst = FST(label) 636 637 initial_state = frozenset( [(self.initial_state,())] ) 638 new_fst.add_state(initial_state) 639 new_fst.initial_state = initial_state 640 641 queue = [initial_state] 642 while queue: 643 new_fst_state = queue.pop() 644 645 # For each final state from the original FSM that's 646 # contained in the new FST's state, compute the finalizing 647 # string. If there is at least one finalizing string, 648 # then the new state is a final state. However, if the 649 # finalizing strings are not all identical, then the 650 # transduction defined by this FST is nondeterministic, so 651 # fail. 652 finalizing_strings = [w+self.finalizing_string(s) 653 for (s,w) in new_fst_state 654 if self.is_final(s)] 655 if len(set(finalizing_strings)) > 0: 656 if not self._all_equal(finalizing_strings): 657 # multiple conflicting finalizing strings -> bad! 658 raise ValueError("Determinization failed") 659 new_fst.set_final(new_fst_state) 660 new_fst.set_finalizing_string(new_fst_state, 661 finalizing_strings[0]) 662 663 # sym -> dst -> [residual] 664 # nb: we checked above that len(in_string)==1 for all arcs. 665 arc_table = {} 666 for (s,w) in new_fst_state: 667 for arc in self.outgoing(s): 668 sym = self.in_string(arc)[0] 669 dst = self.dst(arc) 670 residual = w + self.out_string(arc) 671 arc_table.setdefault(sym,{}).setdefault(dst,set()) 672 arc_table[sym][dst].add(residual) 673 674 # For each symbol in the arc table, we need to create a 675 # single edge in the new FST. This edge's input string 676 # will be the input symbol; its output string will be the 677 # shortest common prefix of strings that can be generated 678 # by the original FST in response to the symbol; and its 679 # destination state will encode the set of states that the 680 # original FST can go to when it sees this symbol, paired 681 # with the residual output strings that would have been 682 # generated by the original FST, but have not yet been 683 # generated by the new FST. 684 for sym in arc_table: 685 for dst in arc_table[sym]: 686 if len(arc_table[sym][dst]) > 1: 687 # two arcs w/ the same src, dst, and insym, 688 # but different residuals -> bad! 689 raise ValueError("Determinization failed") 690 691 # Construct a list of (destination, residual) pairs. 692 dst_residual_pairs = [(dst, arc_table[sym][dst].pop()) 693 for dst in arc_table[sym]] 694 695 # Find the longest common prefix of all the residuals. 696 # Note that it's ok if some of the residuals disagree, 697 # but *only* if the states associated with those 698 # residuals can never both reach a final state with a 699 # single input string. 700 residuals = [res for (dst, res) in dst_residual_pairs] 701 prefix = self._common_prefix(residuals) 702 703 # Construct the new arc's destination state. The new 704 # arc's output string will be `prefix`, so the new 705 # destination state should be the set of all pairs 706 # (dst, residual-prefix). 707 new_arc_dst = frozenset([(dst, res[len(prefix):]) 708 for (dst,res) in dst_residual_pairs]) 709 710 # If the new arc's destination state isn't part of 711 # the FST yet, then add it; and add it to the queue. 712 if not new_fst.has_state(new_arc_dst): 713 new_fst.add_state(new_arc_dst) 714 queue.append(new_arc_dst) 715 716 # Create the new arc. 717 new_fst.add_arc(src=new_fst_state, dst=new_arc_dst, 718 in_string=(sym,), out_string=prefix) 719 return new_fst
720
721 - def _all_equal(self, lst):
722 """Return true if all elements in the list are equal""" 723 for item in lst[1:]: 724 if item != lst[0]: return False 725 return True
726
727 - def _common_prefix(self, sequences):
728 """Return the longest sequence that is a prefix of all of the 729 given sequences.""" 730 prefix = sequences[0] 731 for seq in sequences[1:]: 732 # If the sequence is longer then the prefix, then truncate 733 # the prefix to the length of the sequence. 734 prefix = prefix[:len(seq)] 735 # If the prefix doesn't match item i of the sequence, then 736 # truncate the prefix to include everything up to (but not 737 # including) element i. 738 for i in range(len(prefix)): 739 if seq[i] != prefix[i]: 740 prefix = prefix[:i] 741 break 742 return prefix
743 744 #//////////////////////////////////////////////////////////// 745 #{ Misc 746 #//////////////////////////////////////////////////////////// 747
748 - def copy(self, label=None):
749 # Choose a label & create the FST. 750 if label is None: label = '%s-copy' % self.label 751 fst = FST(label) 752 753 # Copy all state: 754 fst._initial_state = self._initial_state 755 fst._incoming = self._incoming.copy() 756 fst._outgoing = self._outgoing.copy() 757 fst._is_final = self._is_final.copy() 758 fst._finalizing_string = self._finalizing_string.copy() 759 fst._state_descr = self._state_descr.copy() 760 fst._src = self._src.copy() 761 fst._dst = self._dst.copy() 762 fst._in_string = self._in_string.copy() 763 fst._out_string = self._out_string.copy() 764 fst._arc_descr = self._arc_descr.copy() 765 return fst
766
767 - def __str__(self):
768 lines = ['FST %s' % self.label] 769 for state in sorted(self.states()): 770 # State information. 771 if state == self.initial_state: 772 line = '-> %s' % state 773 lines.append(' %-40s # Initial state' % line) 774 if self.is_final(state): 775 line = '%s ->' % state 776 if self.finalizing_string(state): 777 line += ' [%s]' % ' '.join(self.finalizing_string(state)) 778 lines.append(' %-40s # Final state' % line) 779 # List states that would otherwise not be listed. 780 if (state != self.initial_state and not self.is_final(state) 781 and not self.outgoing(state) and not self.incoming(state)): 782 lines.append(' %-40s # State' % state) 783 # Outgoing edge information. 784 for arc in sorted(self.arcs()): 785 src, dst, in_string, out_string = self.arc_info(arc) 786 line = ('%s -> %s [%s:%s]' % 787 (src, dst, ' '.join(in_string), ' '.join(out_string))) 788 lines.append(' %-40s # Arc' % line) 789 return '\n'.join(lines)
790 791 @staticmethod
792 - def load(filename):
793 label = os.path.split(filename)[-1] 794 return FST.parse(label, open(filename).read())
795 796 @staticmethod
797 - def parse(label, s):
798 fst = FST(label) 799 prev_src = None 800 lines = s.split('\n')[::-1] 801 while lines: 802 line = lines.pop().split('#')[0].strip() # strip comments 803 if not line: continue 804 805 # Initial state 806 m = re.match(r'->\s*(\S+)$', line) 807 if m: 808 label = m.group(1) 809 if not fst.has_state(label): fst.add_state(label) 810 fst.initial_state = label 811 continue 812 813 # Final state 814 m = re.match('(\S+)\s*->\s*(?:\[([^\]]*)\])?$', line) 815 if m: 816 label, finalizing_string = m.groups() 817 if not fst.has_state(label): fst.add_state(label) 818 fst.set_final(label) 819 if finalizing_string is not None: 820 finalizing_string = finalizing_string.split() 821 fst.set_finalizing_string(label, finalizing_string) 822 continue 823 824 # State 825 m = re.match('(\S+)$', line) 826 if m: 827 label = m.group(1) 828 if not fst.has_state(label): fst.add_state(label) 829 continue 830 831 # State description 832 m = re.match(r'descr\s+(\S+?):\s*(.*)$', line) 833 if m: 834 label, descr = m.groups() 835 # Allow for multi-line descriptions: 836 while lines and re.match(r'\s+\S', lines[-1]): 837 descr = descr.rstrip()+' '+lines.pop().lstrip() 838 if not fst.has_state(label): fst.add_state(label) 839 fst.set_descr(label, descr) 840 continue 841 842 # Transition arc 843 m = re.match(r'(\S+)?\s*->\s*(\S+)\s*' 844 r'\[(.*?):(.*?)\]$', line) 845 if m: 846 src, dst, in_string, out_string = m.groups() 847 if src is None: src = prev_src 848 if src is None: raise ValueError("bad line: %r" % line) 849 prev_src = src 850 if not fst.has_state(src): fst.add_state(src) 851 if not fst.has_state(dst): fst.add_state(dst) 852 in_string = tuple(in_string.split()) 853 out_string = tuple(out_string.split()) 854 fst.add_arc(src, dst, in_string, out_string) 855 continue 856 857 raise ValueError("bad line: %r" % line) 858 859 return fst
860
861 - def dotgraph(self):
862 """ 863 Return an AT&T graphviz dot graph. 864 """ 865 # [xx] mark initial node?? 866 lines = ['digraph %r {' % self.label, 867 'node [shape=ellipse]'] 868 state_id = dict([(s,i) for (i,s) in enumerate(self.states())]) 869 if self.initial_state is not None: 870 lines.append('init [shape="plaintext" label=""]') 871 lines.append('init -> %s' % state_id[self.initial_state]) 872 for state in self.states(): 873 if self.is_final(state): 874 final_str = self.finalizing_string(state) 875 if len(final_str)>0: 876 lines.append('%s [label="%s\\n%s", shape=doublecircle]' % 877 (state_id[state], state, ' '.join(final_str))) 878 else: 879 lines.append('%s [label="%s", shape=doublecircle]' % 880 (state_id[state], state)) 881 else: 882 lines.append('%s [label="%s"]' % (state_id[state], state)) 883 for arc in self.arcs(): 884 src, dst, in_str, out_str = self.arc_info(arc) 885 lines.append('%s -> %s [label="%s:%s"]' % 886 (state_id[src], state_id[dst], 887 ' '.join(in_str), ' '.join(out_str))) 888 lines.append('}') 889 return '\n'.join(lines)
890 891 #//////////////////////////////////////////////////////////// 892 #{ Transduction 893 #//////////////////////////////////////////////////////////// 894
895 - def transduce_subsequential(self, input, step=True):
896 return self.step_transduce_subsequential(input, step=False).next()[1]
897
898 - def step_transduce_subsequential(self, input, step=True):
899 """ 900 This is implemented as a generator, to make it easier to 901 support stepping. 902 """ 903 if not self.is_subsequential(): 904 raise ValueError('FST is not subsequential!') 905 906 # Create a transition table that indicates what action we 907 # should take at any state for a given input symbol. In 908 # paritcular, this table maps from (src, in) tuples to 909 # (dst, out, arc) tuples. (arc is only needed in case 910 # we want to do stepping.) 911 transitions = {} 912 for arc in self.arcs(): 913 src, dst, in_string, out_string = self.arc_info(arc) 914 assert len(in_string) == 1 915 assert (src, in_string[0]) not in transitions 916 transitions[src, in_string[0]] = (dst, out_string, arc) 917 918 output = [] 919 state = self.initial_state 920 try: 921 for in_pos, in_sym in enumerate(input): 922 (state, out_string, arc) = transitions[state, in_sym] 923 if step: yield 'step', (arc, in_pos, output) 924 output += out_string 925 yield 'succeed', output 926 except KeyError: 927 yield 'fail', None
928
929 - def transduce(self, input):
930 return self.step_transduce(input, step=False).next()[1]
931
932 - def step_transduce(self, input, step=True):
933 """ 934 This is implemented as a generator, to make it easier to 935 support stepping. 936 """ 937 input = tuple(input) 938 output = [] 939 in_pos = 0 940 941 # 'frontier' is a stack used to keep track of which parts of 942 # the search space we have yet to examine. Each element has 943 # the form (arc, in_pos, out_pos), and indicates that we 944 # should try rolling the input position back to in_pos, the 945 # output position back to out_pos, and applying arc. Note 946 # that the order that we check elements in is important, since 947 # rolling the output position back involves discarding 948 # generated output. 949 frontier = [] 950 951 # Start in the initial state, and search for a valid 952 # transduction path to a final state. 953 state = self.initial_state 954 while in_pos < len(input) or not self.is_final(state): 955 # Get a list of arcs we can possibly take. 956 arcs = self.outgoing(state) 957 958 # Add the arcs to our backtracking stack. (The if condition 959 # could be eliminated if I used eliminate_multi_input_arcs; 960 # but I'd like to retain the ability to trace what's going on 961 # in the FST, as its specified.) 962 for arc in arcs: 963 in_string = self.in_string(arc) 964 if input[in_pos:in_pos+len(in_string)] == in_string: 965 frontier.append( (arc, in_pos, len(output)) ) 966 967 # Get the top element of the frontiering stack. 968 if len(frontier) == 0: 969 yield 'fail', None 970 971 # perform the operation from the top of the frontier. 972 arc, in_pos, out_pos = frontier.pop() 973 if step: 974 yield 'step', (arc, in_pos, output[:out_pos]) 975 976 # update our state, input position, & output. 977 state = self.dst(arc) 978 assert out_pos <= len(output) 979 in_pos = in_pos + len(self.in_string(arc)) 980 output = output[:out_pos] 981 output.extend(self.out_string(arc)) 982 983 # If it's a subsequential transducer, add the final output for 984 # the terminal state. 985 output += self.finalizing_string(state) 986 987 yield 'succeed', output
988 989 990 #//////////////////////////////////////////////////////////// 991 #{ Helper Functions 992 #//////////////////////////////////////////////////////////// 993
994 - def _pick_label(self, label, typ, used_labels):
995 """ 996 Helper function for L{add_state} and C{add_arc} that chooses a 997 label for a new state or arc. 998 """ 999 if label is not None and label in used_labels: 1000 raise ValueError("%s with label %r already exists" % 1001 (typ, label)) 1002 # If no label was specified, pick one. 1003 if label is not None: 1004 return label 1005 else: 1006 label = 1 1007 while '%s%d' % (typ[0], label) in used_labels: label += 1 1008 return '%s%d' % (typ[0], label)
1009 1010 ###################################################################### 1011 #{ AT&T fsmtools Support 1012 ###################################################################### 1013
1014 -class FSMTools:
1015 """ 1016 A class used to interface with the AT&T fsmtools package. In 1017 particular, L{FSMTools.transduce} can be used to transduce an 1018 input string using any subsequential transducer where each input 1019 and output arc is labelled with at most one symbol. 1020 """ 1021 EPSILON = object() 1022 """A special symbol object used to represent epsilon strings in 1023 the symbol<->id mapping (L{FSMTools._symbol_ids}).""" 1024
1025 - def __init__(self, fsmtools_path=''):
1026 self.fsmtools_path = fsmtools_path 1027 """The path of the directory containing the fsmtools binaries.""" 1028 1029 self._symbol_ids = self.IDMapping(self.EPSILON) 1030 """A mapping from symbols to unique integer IDs. We manage 1031 our own mapping, rather than using 'symbol files', since 1032 symbol files can't handle non-string symbols, symbols 1033 containing whitespace, unicode symbols, etc.""" 1034 1035 self._state_ids = self.IDMapping() 1036 """A mapping from state labels to unique integer IDs."""
1037 1038 #//////////////////////////////////////////////////////////// 1039 #{ Transduction 1040 #//////////////////////////////////////////////////////////// 1041
1042 - def transduce(self, fst, input_string):
1043 try: 1044 # Create a temporary working directory for intermediate files. 1045 tempdir = tempfile.mkdtemp() 1046 def tmp(s): return os.path.join(tempdir, s+'.fsm') 1047 1048 # Comile the FST & input file into binary fmstool format. 1049 self.compile_fst(fst, tmp('fst')) 1050 self.compile_string(input_string, tmp('in')) 1051 1052 # Transduce the input using the FST. We do this in two 1053 # steps: first, use fsmcompose to eliminate any paths that 1054 # are not consistent with the input string; and then use 1055 # fsmbestpath to choose a path through the FST. If the 1056 # FST is nondeterministic, then the path chosen is 1057 # arbitrary. Finally, print the result, so we can process 1058 # it and extract the output sequence. 1059 p1 = Popen([self._bin('fsmcompose'), tmp('in'), tmp('fst')], 1060 stdout=PIPE) 1061 p2 = Popen([self._bin('fsmbestpath')], 1062 stdin=p1.stdout, stdout=PIPE) 1063 p3 = Popen([self._bin('fsmprint')], 1064 stdin=p2.stdout, stdout=PIPE) 1065 out_string_fsm = p3.communicate()[0] 1066 finally: 1067 for f in os.listdir(tempdir): 1068 os.unlink(os.path.join(tempdir, f)) 1069 os.rmdir(tempdir) 1070 1071 # If the empty string was returned, then the input was not 1072 # accepted by the FST; return None. 1073 if len(out_string_fsm) == 0: 1074 return None 1075 1076 # Otherwise, the input was accepted, so extract the 1077 # corresponding output string. 1078 out_string = [] 1079 final_state_id = 0 1080 for line in out_string_fsm.split('\n'): 1081 words = line.split() 1082 if len(words) == 5: 1083 out_string.append(self._symbol_ids.getval(words[3])) 1084 final_state_id += int(words[4]) 1085 elif len(words) == 4: 1086 out_string.append(self._symbol_ids.getval(words[3])) 1087 elif len(words) == 2: 1088 final_state_id += int(words[1]) 1089 elif len(words) != 0: 1090 raise ValueError("Bad output line: %r" % line) 1091 1092 # Add on the finalizing string for the final state. 1093 final_state = self._state_ids.getval(final_state_id) 1094 out_string += fst.finalizing_string(final_state) 1095 return out_string
1096 1097 #//////////////////////////////////////////////////////////// 1098 #{ FSM Compilation 1099 #//////////////////////////////////////////////////////////// 1100
1101 - def compile_fst(self, fst, outfile):
1102 """ 1103 Compile the given FST to an fsmtools .fsm file, and write it 1104 to the given filename. 1105 """ 1106 if fst.initial_state is None: 1107 raise ValueError("FST has no initial state!") 1108 if not (fst.is_final(fst.initial_state) or 1109 len(fst.outgoing(fst.initial_state)) > 0): 1110 raise ValueError("Initial state is nonfinal & " 1111 "has no outgoing arcs") 1112 1113 # Put the initial state first, since that's how fsmtools 1114 # decides which state is the initial state. 1115 states = [fst.initial_state] + [s for s in fst.states() if 1116 s != fst.initial_state] 1117 1118 # Write the outgoing edge for each state, & mark final states. 1119 lines = [] 1120 for state in states: 1121 for arc in fst.outgoing(state): 1122 src, dst, in_string, out_string = fst.arc_info(arc) 1123 lines.append('%d %d %d %d\n' % 1124 (self._state_ids.getid(src), 1125 self._state_ids.getid(dst), 1126 self._string_id(in_string), 1127 self._string_id(out_string))) 1128 if fst.is_final(state): 1129 lines.append('%d %d\n' % (self._state_ids.getid(state), 1130 self._state_ids.getid(state))) 1131 1132 # Run fsmcompile to compile it. 1133 p = Popen([self._bin('fsmcompile'), '-F', outfile], stdin=PIPE) 1134 p.communicate(''.join(lines))
1135
1136 - def compile_string(self, sym_string, outfile):
1137 """ 1138 Compile the given symbol string into an fsmtools .fsm file, 1139 and write it to the given filename. This FSM will generate 1140 the given symbol string, and no other strings. 1141 """ 1142 # Create the input for fsmcompile. 1143 lines = [] 1144 for (i, sym) in enumerate(sym_string): 1145 lines.append('%d %d %d\n' % (i, i+1, self._symbol_ids.getid(sym))) 1146 lines.append('%d\n' % len(sym_string)) 1147 1148 # Run fsmcompile to compile it. 1149 p = Popen([self._bin('fsmcompile'), '-F', outfile], stdin=PIPE) 1150 p.communicate(''.join(lines))
1151 1152 #//////////////////////////////////////////////////////////// 1153 #{ Helpers 1154 #//////////////////////////////////////////////////////////// 1155
1156 - def _bin(self, command):
1157 return os.path.join(self.fsmtools_path, command)
1158
1159 - def _string_id(self, sym_string):
1160 if len(sym_string) == 0: 1161 return self._symbol_ids.getid(self.EPSILON) 1162 elif len(sym_string) == 1: 1163 return self._symbol_ids.getid(sym_string[0]) 1164 else: 1165 raise ValueError('fsmtools does not support multi-symbol ' 1166 'input or output strings on arcs.??')
1167
1168 - class IDMapping:
1169 - def __init__(self, *values):
1170 self._id_to_val = list(values) 1171 self._val_to_id = dict([(v,i) for (i,v) in enumerate(values)])
1172
1173 - def getid(self, val):
1174 if val not in self._val_to_id: 1175 self._id_to_val.append(val) 1176 self._val_to_id[val] = len(self._id_to_val)-1 1177 return self._val_to_id[val]
1178
1179 - def getval(self, identifier):
1180 return self._id_to_val[int(identifier)]
1181 1182 ###################################################################### 1183 #{ Graphical Display 1184 ###################################################################### 1185
1186 -class FSTWidget(GraphWidget):
1187 - def __init__(self, canvas, fst, **attribs):
1188 GraphWidget.__init__(self, canvas, (), (), **attribs) 1189 1190 # Create a widget for each state. 1191 self.state_widgets = state_widgets = {} 1192 for state in fst.states(): 1193 label = TextWidget(canvas, state) 1194 if fst.is_final(state) and fst.finalizing_string(state): 1195 fstr = fst.finalizing_string(state) 1196 label = StackWidget(canvas, label, 1197 SequenceWidget(canvas, 1198 SymbolWidget(canvas, 'rightarrow'), 1199 TextWidget(canvas, fstr))) 1200 w = OvalWidget(canvas, label, 1201 double=fst.is_final(state), margin=2, 1202 fill='white') 1203 if state == fst.initial_state: 1204 w = SequenceWidget(canvas, 1205 SymbolWidget(canvas, 'rightarrow'), w) 1206 w['draggable'] = True 1207 state_widgets[state] = w 1208 self.add_node(w) 1209 1210 # Create a widget for each arc. 1211 self.arc_widgets = arc_widgets = {} 1212 for arc in fst.arcs(): 1213 label = TextWidget(canvas, ' '.join(fst.in_string(arc)) + ' : ' + 1214 ' '.join(fst.out_string(arc))) 1215 w = GraphEdgeWidget(canvas, 0,0,0,0, label, color='cyan4') 1216 arc_widgets[arc] = w 1217 self.add_edge(state_widgets[fst.src(arc)], 1218 state_widgets[fst.dst(arc)], w) 1219 1220 # Arrange the graph. 1221 if fst.initial_state is not None: 1222 toplevel = [self.state_widgets[fst.initial_state]] 1223 else: 1224 toplevel = None 1225 self.arrange(toplevel=toplevel)
1226
1227 - def mark_state(self, state, color='green2'):
1228 oval = self.state_widgets[state] 1229 if isinstance(oval, SequenceWidget): 1230 oval = oval.child_widgets()[1] 1231 oval['fill'] = color
1232
1233 - def unmark_state(self, state):
1234 oval = self.state_widgets[state] 1235 if isinstance(oval, SequenceWidget): 1236 oval = oval.child_widgets()[1] 1237 oval['fill'] = 'white'
1238
1239 - def mark_arc(self, arc):
1240 edge = self.arc_widgets[arc] 1241 edge['width'] = 2 1242 edge['color'] = 'blue'
1243
1244 - def unmark_arc(self, arc):
1245 edge = self.arc_widgets[arc] 1246 edge['width'] = 1 1247 edge['color'] = 'cyan4'
1248
1249 -class FSTDisplay:
1250 - def __init__(self, *fsts):
1251 self.cf = CanvasFrame(width=580, height=600, background='#f0f0f0') 1252 self.cf._parent.geometry('+650+50') # hack.. 1253 1254 self.fst_widgets = {} 1255 for fst in fsts: 1256 self.add_fst(fst)
1257
1258 - def add_fst(self, fst):
1259 w = FSTWidget(self.cf.canvas(), fst, draggable=True, 1260 xspace=130, yspace=100) 1261 self.fst_widgets[fst] = w 1262 self.cf.add_widget(w, x=20)
1263
1264 -class FSTDemo:
1265 - def __init__(self, fst):
1266 self.top = Tk() 1267 f1 = Frame(self.top) 1268 f2 = Frame(self.top) 1269 f3 = Frame(self.top) 1270 f4 = Frame(self.top) 1271 1272 # The canvas for the FST itself. 1273 self.cf = CanvasFrame(f1, width=800, height=400, 1274 background='#f0f0f0', 1275 relief="sunken", border="2") 1276 self.cf.pack(expand=True, fill='both') 1277 1278 # The description of the current state. 1279 self.state_label = Label(f4, font=('bold', -16)) 1280 self.state_label.pack(side='top', anchor='sw') 1281 self.state_descr = Text(f4, height=3, wrap='word', border="2", 1282 relief="sunken", font='helvetica', 1283 width=10) 1284 self.state_descr.pack(side='bottom', expand=True, fill='x') 1285 1286 # The input string. 1287 font = ('courier', -16, 'bold') 1288 Label(f2,text=' Input:', font='courier').pack(side='left') 1289 self.in_text = in_text = Text(f2, height=1, wrap='none', 1290 font=font, background='#c0ffc0', 1291 #padx=0, pady=0, 1292 width=10, 1293 highlightcolor='green', 1294 highlightbackground='green', 1295 highlightthickness=1) 1296 in_text.tag_config('highlight', foreground='white', 1297 background='green4') 1298 in_text.insert('end', 'r = reset; <space> = step') 1299 in_text.tag_add('read', '1.0', '1.4') 1300 in_text.pack(side='left', expand=True, fill="x") 1301 1302 # The input string. 1303 Label(f3,text='Output:', font='courier').pack(side='left') 1304 self.out_text = out_text = Text(f3, height=1, wrap='none', 1305 font=font, background='#ffc0c0', 1306 #padx=0, pady=0, 1307 width=10, 1308 highlightcolor='red', 1309 highlightbackground='red', 1310 highlightthickness=1) 1311 out_text.tag_config('highlight', foreground='white', 1312 background='red4') 1313 out_text.pack(side='left', expand=True, fill="x") 1314 1315 f1.pack(expand=True, fill='both', side='top', padx=5, pady=5) 1316 f4.pack(expand=False, fill='x', side='bottom', padx=5, pady=5) 1317 f3.pack(expand=False, fill='x', side='bottom', padx=5, pady=5) 1318 f2.pack(expand=False, fill='x', side='bottom', padx=5, pady=5) 1319 1320 self.top.title('FST') 1321 self.top.geometry('+650+50') 1322 self.top.bind('<Control-p>', lambda e: self.cf.print_to_file()) 1323 self.top.bind('<Control-x>', self.destroy) 1324 self.top.bind('<Control-q>', self.destroy) 1325 self.top.bind('<space>', self.step) 1326 self.top.bind('r', lambda e: self.transduce(self.stepper_input)) 1327 1328 self.stepper = None 1329 1330 self.graph = None 1331 self.set_fst(fst)
1332
1333 - def transduce(self, input):
1334 if self.fst.is_subsequential(): 1335 self.stepper = self.fst.step_transduce_subsequential(input) 1336 else: 1337 self.stepper = self.fst.step_transduce(input) 1338 self.stepper_input = input 1339 # the following really duplicates code in step(), and should be 1340 # factored out. 1341 self.in_text.delete('1.0', 'end') 1342 self.out_text.delete('1.0', 'end') 1343 self.in_text.insert('1.0', ' '.join(self.stepper_input)) 1344 for state in self.fst.states(): 1345 self.graph.unmark_state(state) 1346 self.graph.mark_state(self.fst.initial_state) 1347 self.state_label['text'] = 'State = %s' % self.fst.initial_state 1348 self.state_descr.delete('1.0', 'end') 1349 state_descr = fst.state_descr(self.fst.initial_state) 1350 self.state_descr.insert('end', state_descr or '')
1351
1352 - def step(self, *e):
1353 if self.stepper is None: return 1354 1355 # Perform one step. 1356 try: result, val = self.stepper.next() 1357 except StopIteration: return 1358 1359 if result == 'fail': 1360 self.stepper = None 1361 self.out_text.insert('end', ' (Failed!)') 1362 elif result == 'succeed': 1363 self.stepper = None 1364 self.out_text.delete('1.0', 'end') 1365 self.out_text.insert('end', ' '.join(val)) 1366 self.out_text.tag_add('highlight', '1.0', 'end-1c') 1367 self.out_text.insert('end', ' (Finished!)') 1368 elif result == 'backtrack': 1369 self.out_text.insert('end', ' (Backtrack)') 1370 for state, widget in self.graph.state_widgets.items(): 1371 if state == val: self.graph.mark_state(state, '#f0b0b0') 1372 else: self.graph.unmark_state(state) 1373 else: 1374 (arc, in_pos, output) = val 1375 1376 # Update in text display 1377 in_pos += len(fst.in_string(arc)) 1378 output = list(output)+list(fst.out_string(arc)) 1379 self.in_text.delete('1.0', 'end') 1380 self.in_text.insert('end', ' '.join(self.stepper_input[:in_pos])) 1381 self.in_text.tag_add('highlight', '1.0', 'end-1c') 1382 if in_pos > 0: 1383 self.in_text.insert('end', ' ') 1384 l,r= self.in_text.xview() 1385 if (r-l) < 1: 1386 self.in_text.xview_moveto(1.0-(r-l)/2) 1387 self.in_text.insert('end', ' '.join(self.stepper_input[in_pos:])) 1388 1389 # Update out text display 1390 self.out_text.delete('1.0', 'end') 1391 self.out_text.insert('end', ' '.join(output)) 1392 self.out_text.tag_add('highlight', '1.0', 'end-1c') 1393 1394 # Update the state descr display 1395 self.state_label['text'] = 'State = %s' % fst.dst(arc) 1396 self.state_descr.delete('1.0', 'end') 1397 state_descr = fst.state_descr(fst.dst(arc)) 1398 self.state_descr.insert('end', state_descr or '') 1399 1400 # Highlight the new dst state. 1401 for state, widget in self.graph.state_widgets.items(): 1402 if state == fst.dst(arc): 1403 self.graph.mark_state(state, '#00ff00') 1404 elif state == fst.src(arc): 1405 self.graph.mark_state(state, '#b0f0b0') 1406 else: self.graph.unmark_state(state) 1407 1408 # Highlight the new arc. 1409 for a, widget in self.graph.arc_widgets.items(): 1410 if a == arc: self.graph.mark_arc(a) 1411 else: self.graph.unmark_arc(a) 1412 1413 # Make end of output visible.. 1414 l,r= self.out_text.xview() 1415 if (r-l) < 1: 1416 self.out_text.xview_moveto(1.0-(r-l)/2) 1417 self.out_text.insert('end', ' '*100)
1418
1419 - def set_fst(self, fst):
1420 self.fst = fst 1421 c = self.cf.canvas() 1422 if self.graph is not None: 1423 self.cf.remove_widget(self.graph) 1424 self.graph = FSTWidget(c, self.fst, xspace=130, yspace=100) 1425 self.cf.add_widget(self.graph, 20, 20)
1426
1427 - def destroy(self, *e):
1428 if self.top is None: return 1429 self.top.destroy() 1430 self.top = None
1431
1432 - def mainloop(self, *args, **kwargs):
1433 self.top.mainloop(*args, **kwargs)
1434 1435 ###################################################################### 1436 #{ Test Code 1437 ###################################################################### 1438 1439 if __name__ == '__main__': 1440 # This is a very contrived example. :) 1441 # Something phonetic might be better. 1442 fst = FST.parse("test", """ 1443 -> start 1444 start -> vp [john:john] 1445 start -> vp [mary:mary] 1446 1447 # Delay production of the determiner until we know the gender. 1448 start -> subj_noun [the:] 1449 subj_noun -> vp [dog:le chien] 1450 subj_noun -> vp [cow:la vache] 1451 1452 vp -> obj [eats:mange] 1453 obj -> obj_noun [the:] 1454 obj -> obj_noun [:] 1455 obj_noun -> end [grass:de l'herbe] 1456 obj_noun -> end [bread:du pain] 1457 end -> 1458 """) 1459 1460 print "john eats the bread ->" 1461 print ' '+ ' '.join(fst.transduce("john eats the bread".split())) 1462 rev = fst.inverted() 1463 print "la vache mange de l'herbe ->" 1464 print ' '+' '.join(rev.transduce("la vache mange de l'herbe".split())) 1465 1466 demo = FSTDemo(fst) 1467 demo.transduce("the cow eats the bread".split()) 1468 demo.mainloop() 1469