1 """
2 This module provides utilities for treating Python dictionaries as X{feature
3 structures}. Specifically, it contains the C{unify} function, which can be used
4 to merge the properties of two dictionaries, and the C{Variable} class, which
5 holds an unknown value to be used in unification.
6
7 A X{feature structure} is a mapping from feature names to feature values,
8 where:
9
10 - Each X{feature name} is a case sensitive string.
11 - Each X{feature value} can be a base value (such as a string), a
12 variable, or a nested feature structure.
13
14 However, feature structures are not a specialized class; they are represented
15 by dictionaries, or more generally by anything that responds to the C{has_key}
16 method. The YAML representation can be used to create and display feature
17 structures intuitively:
18
19 >>> f1 = yaml.load('''
20 ... A:
21 ... B: b
22 ... D: d
23 ... ''')
24 >>> f2 = yaml.load('''
25 ... A:
26 ... C: c
27 ... D: d
28 ... ''')
29 >>> print show(unify(f1, f2))
30 A:
31 B: b
32 C: c
33 D: d
34
35 Feature structures are typically used to represent partial information
36 about objects. A feature name that is not mapped to a value stands
37 for a feature whose value is unknown (I{not} a feature without a
38 value). Two feature structures that represent (potentially
39 overlapping) information about the same object can be combined by
40 X{unification}. When two inconsistant feature structures are unified,
41 the unification fails and raises an error.
42
43 Features can be specified using X{feature paths}, or tuples of feature names
44 that specify paths through the nested feature structures to a value.
45
46 Feature structures may contain reentrant feature values. A
47 X{reentrant feature value} is a single feature value that can be
48 accessed via multiple feature paths. Unification preserves the
49 reentrance relations imposed by both of the unified feature
50 structures. After unification, any extensions to a reentrant feature
51 value will be visible using any of its feature paths.
52
53 Feature structure variables are encoded using the L{Variable} class. The scope
54 of a variable is determined by the X{bindings} used when the structure
55 including that variable is unified. Bindings can be reused between unifications
56 to ensure that variables with the same name get the same value.
57
58 """
59
60 from copy import copy, deepcopy
61 import re
62 import yaml
63
64 import sys
65
67 """
68 An exception that is raised when two values cannot be unified.
69 """
70 pass
71
73 """
74 Given a variable representation such as C{?x}, construct a corresponding
75 Variable object.
76 """
77 return Variable(varname[1:])
78
80 """
81 Determine whether to treat a given object as a feature structure. The
82 test is whether it responds to C{has_key}. This can be overridden if the
83 object includes an attribute or method called C{_no_feature}.
84
85 @param obj: The object to be tested
86 @type obj: C{object}
87 @return: True iff the object can be treated as a feature structure
88 @rtype: C{bool}
89 """
90 return isinstance(obj, dict) or isinstance(obj, FeatureI)
91
94 raise TypeError, "FeatureI is an abstract interface"
95
97 """
98 _FORWARD is a singleton value, used in unification as a flag that a value
99 has been forwarded to another object.
100
101 This class itself is used as the singleton value. It cannot be
102 instantiated.
103 """
105 raise TypeError, "The _FORWARD class is not meant to be instantiated"
106
108 """
109 A Variable is an object that can be used in unification to hold an
110 initially unknown value. Two equivalent Variables, for example, can be used
111 to require that two features have the same value.
112
113 When a Variable is assigned a value, it will eventually be replaced by
114 that value. However, in order to make that value show up everywhere the
115 variable appears, the Variable temporarily stores its assigned value and
116 becomes a I{bound variable}. Bound variables do not appear in the results
117 of unification.
118
119 Variables are distinguished by their name, and by the dictionary of
120 I{bindings} that is being used to determine their values. Two variables can
121 have the same name but be associated with two different binding
122 dictionaries: those variables are not equal.
123 """
124 _next_numbered_id = 1
125
126 - def __init__(self, name=None, value=None):
127 """
128 Construct a new feature structure variable.
129
130 The value should be left at its default of None; it is only used
131 internally to copy bound variables.
132
133 @type name: C{string}
134 @param name: An identifier for this variable. Two C{Variable} objects
135 with the same name will be given the same value in a given dictionary
136 of bindings.
137 """
138 self._uid = Variable._next_numbered_id
139 Variable._next_numbered_id += 1
140 if name is None: name = self._uid
141 self._name = str(name)
142 self._value = value
143
145 """
146 @return: This variable's name.
147 @rtype: C{string}
148 """
149 return self._name
150
152 """
153 If this varable is bound, find its value. If it is unbound or aliased
154 to an unbound variable, returns None.
155
156 @return: The value of this variable, if any.
157 @rtype: C{object}
158 """
159 if isinstance(self._value, Variable): return self._value.value()
160 else: return self._value
162 """
163 @return: A copy of this variable.
164 @rtype: C{Variable}
165 """
166 return Variable(self.name(), self.value())
167
169 """
170 Variables are aliased to other variables by one variable _forwarding_
171 to the other. The first variable simply has the second as its value,
172 but it acts like the second variable's _value_ is its value.
173
174 forwarded_self returns the final Variable object that actually stores
175 the value.
176
177 @return: The C{Variable} responsible for storing this variable's value.
178 @rtype: C{Variable}
179 """
180 if isinstance(self._value, Variable):
181 return self._value.forwarded_self()
182 else: return self
183
184 - def bindValue(self, value, ourbindings, otherbindings):
185 """
186 Bind this variable to a value. C{ourbindings} are the bindings that
187 accompany the feature structure this variable came from;
188 C{otherbindings} are the bindings from the structure it's being unified
189 with.
190
191 @type value: C{object}
192 @param value: The value to be assigned.
193 @type ourbindings: C{dict}
194 @param ourbindings: The bindings associated with this variable.
195 @type otherbindings: C{dict}
196 @param otherbindings: The bindings associated with the value being
197 assigned. (May be identical to C{ourbindings}.)
198 """
199 if isinstance(self._value, Variable):
200
201 return self._value.bindValue(value, ourbindings, otherbindings)
202 if self._value is None:
203
204 self._value = value
205 else:
206
207
208 self._value = unify(self._value, value, ourbindings, otherbindings)
209
210 - def forwardTo(self, other, ourbindings, otherbindings):
211 """
212 A unification wants this variable to be aliased to another variable.
213 Forward this variable to the other one, and return the other.
214
215 @type other: C{Variable}
216 @param other: The variable to replace this one.
217 @type ourbindings: C{dict}
218 @param ourbindings: The bindings associated with this variable.
219 @type otherbindings: C{dict}
220 @param otherbindings: The bindings associated with the other variable.
221 (May be identical to C{ourbindings}.)
222 @return: C{other}
223 @rtype: C{Variable}
224 """
225 other.bindValue(self.value(), ourbindings, otherbindings)
226 self._value = other
227 return other
228
229 - def __hash__(self): return hash(self._uid)
231 """
232 Variables are equal if they are the same object or forward to the
233 same object. Variables with the same name may still be unequal.
234 """
235 if not isinstance(other, Variable): return -1
236 if isinstance(self._value, Variable): return cmp(self._value, other)
237 else: return cmp((self._name, self._value), (other._name, other._value))
239 if self._value is None: return '?%s' % self._name
240 else: return '?%s: %r' % (self._name, self._value)
241
243 """
244 An interface for classes that can perform substitutions for feature
245 variables.
246 """
248 """
249 @return: The object that is obtained by replacing
250 each variable bound by C{bindings} with its values.
251 @rtype: (any)
252 """
253 raise NotImplementedError
254
257 newval = self
258 for semvar in self.variables():
259 varstr = str(semvar)
260
261 if varstr.startswith('?'):
262 var = makevar(varstr)
263 if bindings.has_key(var.name()):
264 newval = newval.replace(semvar, bindings[var.name()])
265 return newval
266
268 """
269 Works like yaml.dump(), but with output suited for doctests. Flow style
270 is always off, and there is no blank line at the end.
271 """
272 return yaml.dump(data, default_flow_style=False).strip()
273
284
286 "Output variables in YAML as ?name."
287 return dumper.represent_scalar(u'!var', u'?%s' % var.name())
288 yaml.add_representer(Variable, variable_representer)
289
295 yaml.add_constructor(u'!var', variable_constructor)
296 yaml.add_implicit_resolver(u'!var', re.compile(r'^\?\w+$'))
297
299 """
300 Make a deep copy of a feature structure, preserving reentrance using the
301 C{memo} dictionary. Meanwhile, variables are replaced by their bound
302 values, if these values are already known, and variables with unknown
303 values are given placeholder bindings.
304 """
305 if memo is None: memo = {}
306 if id(feature) in memo: return memo[id(feature)]
307 if isinstance(feature, Variable) and bindings is not None:
308 if not bindings.has_key(feature.name()):
309 bindings[feature.name()] = feature.copy()
310 result = _copy_and_bind(bindings[feature.name()], None, memo)
311 else:
312 if isMapping(feature):
313
314 result = feature.__class__()
315 for (key, value) in feature.items():
316 result[key] = _copy_and_bind(value, bindings, memo)
317 elif isinstance(feature, SubstituteBindingsI):
318 if bindings is not None:
319 result = feature.substitute_bindings(bindings).simplify()
320 else:
321 result = feature.simplify()
322 else: result = feature
323 memo[id(feature)] = result
324 memo[id(result)] = result
325 return result
326
328 """
329 Replace variables in a feature structure with their bound values.
330 """
331 return _copy_and_bind(feature, bindings.copy())
332
333 apply = substitute_bindings
334
335 -def unify(feature1, feature2, bindings1=None, bindings2=None, memo=None, fail=None, trace=0):
336 """
337 In general, the 'unify' procedure takes two values, and either returns a
338 value that provides the information provided by both values, or fails if
339 that is impossible.
340
341 These values can have any type, but fall into a few general cases:
342 - Values that respond to C{has_key} represent feature structures. The
343 C{unify} procedure will recurse into them and unify their inner values.
344 - L{Variable}s represent an unknown value, and are handled specially.
345 The values assigned to variables are tracked using X{bindings}.
346 - C{None} represents the absence of information.
347 - Any other value is considered a X{base value}. Base values are
348 compared to each other with the == operation.
349
350 The value 'None' represents the absence of any information. It specifies no
351 properties and acts as the identity in unification.
352 >>> unify(3, None)
353 3
354
355 >>> unify(None, 'fish')
356 'fish'
357
358 A base value unifies with itself, but not much else.
359 >>> unify(True, True)
360 True
361
362 >>> unify([1], [1])
363 [1]
364
365 >>> unify('a', 'b')
366 Traceback (most recent call last):
367 ...
368 UnificationFailure
369
370 When two mappings (representing feature structures, and usually implemented
371 as dictionaries) are unified, any chain of keys that accesses a value in
372 either mapping will access an equivalent or more specific value in the
373 unified mapping. If this is not possible, UnificationFailure is raised.
374
375 >>> f1 = dict(A=dict(B='b'))
376 >>> f2 = dict(A=dict(C='c'))
377 >>> unify(f1, f2) == dict(A=dict(B='b', C='c'))
378 True
379
380 The empty dictionary specifies no features. It unifies with any mapping.
381 >>> unify({}, dict(foo='bar'))
382 {'foo': 'bar'}
383
384 >>> unify({}, True)
385 Traceback (most recent call last):
386 ...
387 UnificationFailure
388
389 Representing dictionaries in YAML form is useful for making feature
390 structures readable:
391
392 >>> f1 = yaml.load("number: singular")
393 >>> f2 = yaml.load("person: 3")
394 >>> print show(unify(f1, f2))
395 number: singular
396 person: 3
397
398 >>> f1 = yaml.load('''
399 ... A:
400 ... B: b
401 ... D: d
402 ... ''')
403 >>> f2 = yaml.load('''
404 ... A:
405 ... C: c
406 ... D: d
407 ... ''')
408 >>> print show(unify(f1, f2))
409 A:
410 B: b
411 C: c
412 D: d
413
414 Variables are names for unknown values. Variables are assigned values
415 that will make unification succeed. The values of variables can be reused
416 in later unifications if you provide a dictionary of _bindings_ from
417 variables to their values.
418 >>> bindings = {}
419 >>> print unify(Variable('x'), 5, bindings)
420 5
421
422 >>> print bindings
423 {'x': 5}
424
425 >>> print unify({'a': Variable('x')}, {}, bindings)
426 {'a': 5}
427
428 The same variable name can be reused in different binding dictionaries
429 without collision. In some cases, you may want to provide two separate
430 binding dictionaries to C{unify} -- one for each feature structure, so
431 their variables do not collide.
432
433 In the following examples, two different feature structures use the
434 variable ?x to require that two values are equal. The values assigned to
435 ?x are consistent within each structure, but would be inconsistent if every
436 ?x had to have the same value.
437
438 >>> f1 = yaml.load('''
439 ... a: 1
440 ... b: 1
441 ... c: ?x
442 ... d: ?x
443 ... ''')
444 >>> f2 = yaml.load('''
445 ... a: ?x
446 ... b: ?x
447 ... c: 2
448 ... d: 2
449 ... ''')
450 >>> bindings1 = {}
451 >>> bindings2 = {}
452 >>> print show(unify(f1, f2, bindings1, bindings2))
453 a: 1
454 b: 1
455 c: 2
456 d: 2
457
458 >>> print bindings1
459 {'x': 2}
460
461 >>> print bindings2
462 {'x': 1}
463
464 Feature structures can involve _reentrant_ values, where multiple feature
465 paths lead to the same value. This is represented by the features having
466 the same Python object as a value. (This kind of identity can be tested
467 using the C{is} operator.)
468
469 Unification preserves the properties of reentrance. So if a reentrant value
470 is changed by unification, it is changed everywhere it occurs, and it is
471 still reentrant. Reentrant features can even form cycles; these
472 cycles can now be printed through the current YAML library.
473
474 >>> f1 = yaml.load('''
475 ... A: &1 # &1 defines a reference in YAML...
476 ... B: b
477 ... E:
478 ... F: *1 # and *1 uses the previously defined reference.
479 ... ''')
480 >>> f1['E']['F']['B']
481 'b'
482 >>> f1['A'] is f1['E']['F']
483 True
484 >>> f2 = yaml.load('''
485 ... A:
486 ... C: c
487 ... E:
488 ... F:
489 ... D: d
490 ... ''')
491 >>> f3 = unify(f1, f2)
492 >>> print show(f3)
493 A: &id001
494 B: b
495 C: c
496 D: d
497 E:
498 F: *id001
499 >>> f3['A'] is f3['E']['F'] # Showing that the reentrance still holds.
500 True
501
502 This unification creates a cycle:
503 >>> f1 = yaml.load('''
504 ... F: &1 {}
505 ... G: *1
506 ... ''')
507 >>> f2 = yaml.load('''
508 ... F:
509 ... H: &2 {}
510 ... G: *2
511 ... ''')
512 >>> f3 = unify(f1, f2)
513 >>> print f3
514 {'G': {'H': {...}}, 'F': {'H': {...}}}
515 >>> print f3['F'] is f3['G']
516 True
517 >>> print f3['F'] is f3['G']['H']
518 True
519 >>> print f3['F'] is f3['G']['H']['H']
520 True
521
522 A cycle can also be created using variables instead of reentrance.
523 Here we supply a single set of bindings, so that it is used on both sides
524 of the unification, making ?x mean the same thing in both feature
525 structures.
526
527 >>> f1 = yaml.load('''
528 ... F:
529 ... H: ?x
530 ... ''')
531 >>> f2 = yaml.load('''
532 ... F: ?x
533 ... ''')
534 >>> f3 = unify(f1, f2, {})
535 >>> print f3
536 {'F': {'H': {...}}}
537 >>> print f3['F'] is f3['F']['H']
538 True
539 >>> print f3['F'] is f3['F']['H']['H']
540 True
541
542 Two sets of bindings can be provided because the variable names on each
543 side of the unification may be unrelated. An example involves unifying the
544 following two structures, which each require that two values are
545 equivalent, and happen to both use ?x to express that requirement.
546
547 >>> f1 = yaml.load('''
548 ... a: 1
549 ... b: 1
550 ... c: ?x
551 ... d: ?x
552 ... ''')
553 >>> f2 = yaml.load('''
554 ... a: ?x
555 ... b: ?x
556 ... c: 2
557 ... d: 2
558 ... ''')
559 >>> bindings1 = {}
560 >>> bindings2 = {}
561 >>> # We could avoid defining two empty dictionaries by simply using the
562 >>> # defaults, with unify(f1, f2) -- but we want to be able to examine
563 >>> # the bindings afterward.
564 >>> print show(unify(f1, f2, bindings1, bindings2))
565 a: 1
566 b: 1
567 c: 2
568 d: 2
569 >>> print bindings1
570 {'x': 2}
571 >>> print bindings2
572 {'x': 1}
573
574 If a variable is unified with another variable, the two variables are
575 _aliased_ to each other; they share the same value, similarly to reentrant
576 feature structures. This is represented in a set of bindings as one
577 variable having the other as its value.
578 >>> f1 = yaml.load('''
579 ... a: ?x
580 ... b: ?x
581 ... ''')
582 >>> f2 = yaml.load('''
583 ... b: ?y
584 ... c: ?y
585 ... ''')
586 >>> bindings = {}
587 >>> print show(unify(f1, f2, bindings))
588 a: &id001 ?y
589 b: *id001
590 c: *id001
591 >>> print bindings
592 {'x': ?y}
593
594 Reusing the same variable bindings ensures that appropriate bindings are
595 made after the fact:
596 >>> bindings = {}
597 >>> f1 = {'a': Variable('x')}
598 >>> f2 = unify(f1, {'a': {}}, bindings)
599 >>> f3 = unify(f2, {'b': Variable('x')}, bindings)
600 >>> print show(f3)
601 a: &id001 {}
602 b: *id001
603 >>> print bindings
604 {'x': {}}
605
606 @param feature1: The first object to be unified.
607 @type feature1: C{object} (probably a mapping)
608 @param feature2: The second object to be unified.
609 @type feature2: C{object} (probably a mapping)
610 @param bindings1: The variable bindings associated with the first object.
611 @type bindings1: C{dict} or None
612 @param bindings2: The variable bindings associated with the second object,
613 if these are distinct from C{bindings1}.
614 @type bindings2: C{dict} or None
615 @return: The result of unifying the two objects.
616 @rtype: C{object} (probably a mapping)
617 """
618 if fail is None:
619 def failerror(f1, f2):
620 raise UnificationFailure
621 fail = failerror
622
623 if bindings1 is None and bindings2 is None:
624 bindings1 = {}
625 bindings2 = {}
626 else:
627 if bindings1 is None: bindings1 = {}
628 if bindings2 is None: bindings2 = bindings1
629
630 if memo is None: memo = {}
631 copymemo = {}
632 if memo.has_key((id(feature1), id(feature2))):
633 result = memo[id(feature1), id(feature2)]
634 if result is UnificationFailure:
635 if trace > 2:
636 print '(cached) Unifying: %r + %r --> [fail]' % (feature1, feature2)
637 raise result()
638 if trace > 2:
639 print '(cached) Unifying: %r + %r --> ' % (feature1, feature2),
640 print repr(result)
641 return result
642
643 if trace > 1:
644 print 'Unifying: %r + %r --> ' % (feature1, feature2),
645
646
647
648
649 copy1 = _copy_and_bind(feature1, bindings1, copymemo)
650 copy2 = _copy_and_bind(feature2, bindings2, copymemo)
651
652 for b in (bindings1, bindings2):
653 for (vname, value) in b.items():
654 value_id = id(value)
655 if value_id in copymemo:
656 b[vname] = copymemo[value_id]
657
658
659 try:
660 unified = _destructively_unify(copy1, copy2, bindings1, bindings2, memo,
661 fail)
662 except UnificationFailure:
663 if trace > 1: print '[fail]'
664 memo[id(feature1), id(feature2)] = UnificationFailure
665 raise
666
667 _apply_forwards_to_bindings(bindings1)
668 _apply_forwards_to_bindings(bindings2)
669 _apply_forwards(unified, {})
670 unified = _lookup_values(unified, {}, remove=False)
671 _lookup_values(bindings1, {}, remove=True)
672 _lookup_values(bindings2, {}, remove=True)
673
674 if trace > 1:
675 print repr(unified)
676 elif trace > 0:
677 print 'Unifying: %r + %r --> %r' % (feature1, feature2, repr(unified))
678
679 memo[id(feature1), id(feature2)] = unified
680 return unified
681
684 """
685 Attempt to unify C{self} and C{other} by modifying them
686 in-place. If the unification succeeds, then C{self} will
687 contain the unified value, and the value of C{other} is
688 undefined. If the unification fails, then a
689 UnificationFailure is raised, and the values of C{self}
690 and C{other} are undefined.
691 """
692 if depth > 50:
693 print "Infinite recursion in this unification:"
694 print show(dict(feature1=feature1, feature2=feature2,
695 bindings1=bindings1, bindings2=bindings2, memo=memo))
696 raise ValueError, "Infinite recursion in unification"
697 if memo.has_key((id(feature1), id(feature2))):
698 result = memo[id(feature1), id(feature2)]
699 if result is UnificationFailure: raise result()
700 unified = _do_unify(feature1, feature2, bindings1, bindings2, memo, fail,
701 depth)
702 memo[id(feature1), id(feature2)] = unified
703 return unified
704
705 -def _do_unify(feature1, feature2, bindings1, bindings2, memo, fail, depth=0):
706 """
707 Do the actual work of _destructively_unify when the result isn't memoized.
708 """
709
710
711 if feature1 is None: return feature2
712 if feature2 is None: return feature1
713 if feature1 is feature2: return feature1
714
715
716 if isinstance(feature1, Variable):
717 if isinstance(feature2, Variable):
718
719
720 return feature1.forwardTo(feature2, bindings1, bindings2)
721 else:
722 feature1.bindValue(feature2, bindings1, bindings2)
723 return feature1
724 if isinstance(feature2, Variable):
725 feature2.bindValue(feature1, bindings2, bindings1)
726 return feature2
727
728
729
730 if not isMapping(feature1):
731 if feature1 == feature2: return feature1
732 else:
733 return fail(feature1, feature2)
734 if not isMapping(feature2):
735 return fail(feature1, feature2)
736
737
738
739
740 while feature2.has_key(_FORWARD): feature2 = feature2[_FORWARD]
741 if feature1 is not feature2: feature2[_FORWARD] = feature1
742 for (fname, val2) in feature2.items():
743 if fname == _FORWARD: continue
744 val1 = feature1.get(fname)
745 feature1[fname] = _destructively_unify(val1, val2, bindings1,
746 bindings2, memo, fail, depth+1)
747 return feature1
748
764
766 """
767 The unification procedure creates _bound variables_, which are Variable
768 objects that have been assigned a value. Bound variables are not useful
769 in the end result, however, so they should be replaced by their values.
770
771 This procedure takes a mapping, which may be a feature structure or a
772 binding dictionary, and replaces bound variables with their values.
773
774 If the dictionary is a binding dictionary, then 'remove' should be set to
775 True. This ensures that unbound, unaliased variables are removed from the
776 dictionary. If the variable name 'x' is mapped to the unbound variable ?x,
777 then, it should be removed. This is not done with features, because a
778 feature named 'x' can of course have a variable ?x as its value.
779 """
780 if isinstance(mapping, Variable):
781
782
783
784
785
786 var = mapping
787 if var.value() is not None:
788 return var.value()
789 else:
790 return var.forwarded_self()
791 if not isMapping(mapping): return mapping
792 if visited.has_key(id(mapping)): return mapping
793 visited[id(mapping)] = True
794
795 for fname, fval in mapping.items():
796 if isMapping(fval):
797 _lookup_values(fval, visited)
798 elif isinstance(fval, Variable):
799 if fval.value() is not None:
800 mapping[fname] = fval.value()
801 if isMapping(mapping[fname]):
802 _lookup_values(mapping[fname], visited)
803 else:
804 newval = fval.forwarded_self()
805 if remove and newval.name() == fname:
806 del mapping[fname]
807 else:
808 mapping[fname] = newval
809 return mapping
810
821
823 "Run unit tests on unification."
824 import doctest
825 doctest.testmod()
826
827 if __name__ == "__main__":
828 test()
829