1 """
2 This module provides utilities for treating Python dictionaries as X{feature
3 structures}. Specifically, it contains the C{unify} function, which can be used
4 to merge the properties of two dictionaries, and the C{Variable} class, which
5 holds an unknown value to be used in unification.
6
7 A X{feature structure} is a mapping from feature names to feature values,
8 where:
9
10 - Each X{feature name} is a case sensitive string.
11 - Each X{feature value} can be a base value (such as a string), a
12 variable, or a nested feature structure.
13
14 However, feature structures are not a specialized class; they are represented
15 by dictionaries, or more generally by anything that responds to the C{has_key}
16 method. The YAML representation can be used to create and display feature
17 structures intuitively:
18
19 >>> f1 = yaml.load('''
20 ... A:
21 ... B: b
22 ... D: d
23 ... ''')
24 >>> f2 = yaml.load('''
25 ... A:
26 ... C: c
27 ... D: d
28 ... ''')
29 >>> print show(unify(f1, f2))
30 A:
31 B: b
32 C: c
33 D: d
34
35 Feature structures are typically used to represent partial information
36 about objects. A feature name that is not mapped to a value stands
37 for a feature whose value is unknown (I{not} a feature without a
38 value). Two feature structures that represent (potentially
39 overlapping) information about the same object can be combined by
40 X{unification}. When two inconsistant feature structures are unified,
41 the unification fails and raises an error.
42
43 Features can be specified using X{feature paths}, or tuples of feature names
44 that specify paths through the nested feature structures to a value.
45
46 Feature structures may contain reentrant feature values. A
47 X{reentrant feature value} is a single feature value that can be
48 accessed via multiple feature paths. Unification preserves the
49 reentrance relations imposed by both of the unified feature
50 structures. After unification, any extensions to a reentrant feature
51 value will be visible using any of its feature paths.
52
53 Feature structure variables are encoded using the L{Variable} class. The scope
54 of a variable is determined by the X{bindings} used when the structure
55 including that variable is unified. Bindings can be reused between unifications
56 to ensure that variables with the same name get the same value.
57
58 """
59
60 from copy import copy, deepcopy
61 import re
62 import yaml
63
64 import sys
65
67 """
68 An exception that is raised when two values cannot be unified.
69 """
70 pass
71
73 """
74 Determine whether to treat a given object as a feature structure. The
75 test is whether it responds to C{has_key}. This can be overridden if the
76 object includes an attribute or method called C{_no_feature}.
77
78 @param obj: The object to be tested
79 @type obj: C{object}
80 @return: True iff the object can be treated as a feature structure
81 @rtype: C{bool}
82 """
83 return isinstance(obj, dict) or isinstance(obj, FeatureI)
84
87 raise TypeError, "FeatureI is an abstract interface"
88
90 """
91 _FORWARD is a singleton value, used in unification as a flag that a value
92 has been forwarded to another object.
93
94 This class itself is used as the singleton value. It cannot be
95 instantiated.
96 """
98 raise TypeError, "The _FORWARD class is not meant to be instantiated"
99
101 """
102 A Variable is an object that can be used in unification to hold an
103 initially unknown value. Two equivalent Variables, for example, can be used
104 to require that two features have the same value.
105
106 When a Variable is assigned a value, it will eventually be replaced by
107 that value. However, in order to make that value show up everywhere the
108 variable appears, the Variable temporarily stores its assigned value and
109 becomes a I{bound variable}. Bound variables do not appear in the results
110 of unification.
111
112 Variables are distinguished by their name, and by the dictionary of
113 I{bindings} that is being used to determine their values. Two variables can
114 have the same name but be associated with two different binding
115 dictionaries: those variables are not equal.
116 """
117 _next_numbered_id = 1
118
119 - def __init__(self, name=None, value=None):
120 """
121 Construct a new feature structure variable.
122
123 The value should be left at its default of None; it is only used
124 internally to copy bound variables.
125
126 @type name: C{string}
127 @param name: An identifier for this variable. Two C{Variable} objects
128 with the same name will be given the same value in a given dictionary
129 of bindings.
130 """
131 self._uid = Variable._next_numbered_id
132 Variable._next_numbered_id += 1
133 if name is None: name = self._uid
134 self._name = str(name)
135 self._value = value
136
138 """
139 @return: This variable's name.
140 @rtype: C{string}
141 """
142 return self._name
143
145 """
146 If this varable is bound, find its value. If it is unbound or aliased
147 to an unbound variable, returns None.
148
149 @return: The value of this variable, if any.
150 @rtype: C{object}
151 """
152 if isinstance(self._value, Variable): return self._value.value()
153 else: return self._value
155 """
156 @return: A copy of this variable.
157 @rtype: C{Variable}
158 """
159 return Variable(self.name(), self.value())
160
162 """
163 Variables are aliased to other variables by one variable _forwarding_
164 to the other. The first variable simply has the second as its value,
165 but it acts like the second variable's _value_ is its value.
166
167 forwarded_self returns the final Variable object that actually stores
168 the value.
169
170 @return: The C{Variable} responsible for storing this variable's value.
171 @rtype: C{Variable}
172 """
173 if isinstance(self._value, Variable):
174 return self._value.forwarded_self()
175 else: return self
176
177 - def bindValue(self, value, ourbindings, otherbindings):
178 """
179 Bind this variable to a value. C{ourbindings} are the bindings that
180 accompany the feature structure this variable came from;
181 C{otherbindings} are the bindings from the structure it's being unified
182 with.
183
184 @type value: C{object}
185 @param value: The value to be assigned.
186 @type ourbindings: C{dict}
187 @param ourbindings: The bindings associated with this variable.
188 @type otherbindings: C{dict}
189 @param otherbindings: The bindings associated with the value being
190 assigned. (May be identical to C{ourbindings}.)
191 """
192 if isinstance(self._value, Variable):
193
194 return self._value.bindValue(value, ourbindings, otherbindings)
195 if self._value is None:
196
197 self._value = value
198 else:
199
200
201 self._value = unify(self._value, value, ourbindings, otherbindings)
202
203 - def forwardTo(self, other, ourbindings, otherbindings):
204 """
205 A unification wants this variable to be aliased to another variable.
206 Forward this variable to the other one, and return the other.
207
208 @type other: C{Variable}
209 @param other: The variable to replace this one.
210 @type ourbindings: C{dict}
211 @param ourbindings: The bindings associated with this variable.
212 @type otherbindings: C{dict}
213 @param otherbindings: The bindings associated with the other variable.
214 (May be identical to C{ourbindings}.)
215 @return: C{other}
216 @rtype: C{Variable}
217 """
218 other.bindValue(self.value(), ourbindings, otherbindings)
219 self._value = other
220 return other
221
222 - def __hash__(self): return hash(self._uid)
224 """
225 Variables are equal if they are the same object or forward to the
226 same object. Variables with the same name may still be unequal.
227 """
228 if not isinstance(other, Variable): return -1
229 if isinstance(self._value, Variable): return cmp(self._value, other)
230 else: return cmp((self._name, self._value), (other._name, other._value))
232 if self._value is None: return '?%s' % self._name
233 else: return '?%s: %r' % (self._name, self._value)
234
236 """
237 Works like yaml.dump(), but with output suited for doctests. Flow style
238 is always off, and there is no blank line at the end.
239 """
240 return yaml.dump(data, default_flow_style=False).strip()
241
243 "Output variables in YAML as ?name."
244 return dumper.represent_scalar(u'!var', u'?%s' % var.name())
245 yaml.add_representer(Variable, variable_representer)
246
252 yaml.add_constructor(u'!var', variable_constructor)
253 yaml.add_implicit_resolver(u'!var', re.compile(r'^\?\w+$'))
254
256 """
257 Make a deep copy of a feature structure, preserving reentrance using the
258 C{memo} dictionary. Meanwhile, variables are replaced by their bound
259 values, if these values are already known, and variables with unknown
260 values are given placeholder bindings.
261 """
262 if memo is None: memo = {}
263 if id(feature) in memo: return memo[id(feature)]
264 if isinstance(feature, Variable) and bindings is not None:
265 if not bindings.has_key(feature.name()):
266 bindings[feature.name()] = feature.copy()
267 result = _copy_and_bind(bindings[feature.name()], None, memo)
268 else:
269 if isMapping(feature):
270
271 result = feature.__class__()
272 for (key, value) in feature.items():
273 result[key] = _copy_and_bind(value, bindings, memo)
274 else: result = feature
275 memo[id(feature)] = result
276 memo[id(result)] = result
277 return result
278
279 -def apply(feature, bindings):
280 """
281 Replace variables in a feature structure with their bound values.
282 """
283 return _copy_and_bind(feature, bindings.copy())
284
285 -def unify(feature1, feature2, bindings1=None, bindings2=None, memo=None, fail=None, trace=0):
286 """
287 In general, the 'unify' procedure takes two values, and either returns a
288 value that provides the information provided by both values, or fails if
289 that is impossible.
290
291 These values can have any type, but fall into a few general cases:
292 - Values that respond to C{has_key} represent feature structures. The
293 C{unify} procedure will recurse into them and unify their inner values.
294 - L{Variable}s represent an unknown value, and are handled specially.
295 The values assigned to variables are tracked using X{bindings}.
296 - C{None} represents the absence of information.
297 - Any other value is considered a X{base value}. Base values are
298 compared to each other with the == operation.
299
300 The value 'None' represents the absence of any information. It specifies no
301 properties and acts as the identity in unification.
302 >>> unify(3, None)
303 3
304
305 >>> unify(None, 'fish')
306 'fish'
307
308 A base value unifies with itself, but not much else.
309 >>> unify(True, True)
310 True
311
312 >>> unify([1], [1])
313 [1]
314
315 >>> unify('a', 'b')
316 Traceback (most recent call last):
317 ...
318 UnificationFailure
319
320 When two mappings (representing feature structures, and usually implemented
321 as dictionaries) are unified, any chain of keys that accesses a value in
322 either mapping will access an equivalent or more specific value in the
323 unified mapping. If this is not possible, UnificationFailure is raised.
324
325 >>> f1 = dict(A=dict(B='b'))
326 >>> f2 = dict(A=dict(C='c'))
327 >>> unify(f1, f2) == dict(A=dict(B='b', C='c'))
328 True
329
330 The empty dictionary specifies no features. It unifies with any mapping.
331 >>> unify({}, dict(foo='bar'))
332 {'foo': 'bar'}
333
334 >>> unify({}, True)
335 Traceback (most recent call last):
336 ...
337 UnificationFailure
338
339 Representing dictionaries in YAML form is useful for making feature
340 structures readable:
341
342 >>> f1 = yaml.load("number: singular")
343 >>> f2 = yaml.load("person: 3")
344 >>> print show(unify(f1, f2))
345 number: singular
346 person: 3
347
348 >>> f1 = yaml.load('''
349 ... A:
350 ... B: b
351 ... D: d
352 ... ''')
353 >>> f2 = yaml.load('''
354 ... A:
355 ... C: c
356 ... D: d
357 ... ''')
358 >>> print show(unify(f1, f2))
359 A:
360 B: b
361 C: c
362 D: d
363
364 Variables are names for unknown values. Variables are assigned values
365 that will make unification succeed. The values of variables can be reused
366 in later unifications if you provide a dictionary of _bindings_ from
367 variables to their values.
368 >>> bindings = {}
369 >>> print unify(Variable('x'), 5, bindings)
370 5
371
372 >>> print bindings
373 {'x': 5}
374
375 >>> print unify({'a': Variable('x')}, {}, bindings)
376 {'a': 5}
377
378 The same variable name can be reused in different binding dictionaries
379 without collision. In some cases, you may want to provide two separate
380 binding dictionaries to C{unify} -- one for each feature structure, so
381 their variables do not collide.
382
383 In the following examples, two different feature structures use the
384 variable ?x to require that two values are equal. The values assigned to
385 ?x are consistent within each structure, but would be inconsistent if every
386 ?x had to have the same value.
387
388 >>> f1 = yaml.load('''
389 ... a: 1
390 ... b: 1
391 ... c: ?x
392 ... d: ?x
393 ... ''')
394 >>> f2 = yaml.load('''
395 ... a: ?x
396 ... b: ?x
397 ... c: 2
398 ... d: 2
399 ... ''')
400 >>> bindings1 = {}
401 >>> bindings2 = {}
402 >>> print show(unify(f1, f2, bindings1, bindings2))
403 a: 1
404 b: 1
405 c: 2
406 d: 2
407
408 >>> print bindings1
409 {'x': 2}
410
411 >>> print bindings2
412 {'x': 1}
413
414 Feature structures can involve _reentrant_ values, where multiple feature
415 paths lead to the same value. This is represented by the features having
416 the same Python object as a value. (This kind of identity can be tested
417 using the C{is} operator.)
418
419 Unification preserves the properties of reentrance. So if a reentrant value
420 is changed by unification, it is changed everywhere it occurs, and it is
421 still reentrant. Reentrant features can even form cycles; these
422 cycles can now be printed through the current YAML library.
423
424 >>> f1 = yaml.load('''
425 ... A: &1 # &1 defines a reference in YAML...
426 ... B: b
427 ... E:
428 ... F: *1 # and *1 uses the previously defined reference.
429 ... ''')
430 >>> f1['E']['F']['B']
431 'b'
432 >>> f1['A'] is f1['E']['F']
433 True
434 >>> f2 = yaml.load('''
435 ... A:
436 ... C: c
437 ... E:
438 ... F:
439 ... D: d
440 ... ''')
441 >>> f3 = unify(f1, f2)
442 >>> print show(f3)
443 A: &id001
444 B: b
445 C: c
446 D: d
447 E:
448 F: *id001
449 >>> f3['A'] is f3['E']['F'] # Showing that the reentrance still holds.
450 True
451
452 This unification creates a cycle:
453 >>> f1 = yaml.load('''
454 ... F: &1 {}
455 ... G: *1
456 ... ''')
457 >>> f2 = yaml.load('''
458 ... F:
459 ... H: &2 {}
460 ... G: *2
461 ... ''')
462 >>> f3 = unify(f1, f2)
463 >>> print f3
464 {'G': {'H': {...}}, 'F': {'H': {...}}}
465 >>> print f3['F'] is f3['G']
466 True
467 >>> print f3['F'] is f3['G']['H']
468 True
469 >>> print f3['F'] is f3['G']['H']['H']
470 True
471
472 A cycle can also be created using variables instead of reentrance.
473 Here we supply a single set of bindings, so that it is used on both sides
474 of the unification, making ?x mean the same thing in both feature
475 structures.
476
477 >>> f1 = yaml.load('''
478 ... F:
479 ... H: ?x
480 ... ''')
481 >>> f2 = yaml.load('''
482 ... F: ?x
483 ... ''')
484 >>> f3 = unify(f1, f2, {})
485 >>> print f3
486 {'F': {'H': {...}}}
487 >>> print f3['F'] is f3['F']['H']
488 True
489 >>> print f3['F'] is f3['F']['H']['H']
490 True
491
492 Two sets of bindings can be provided because the variable names on each
493 side of the unification may be unrelated. An example involves unifying the
494 following two structures, which each require that two values are
495 equivalent, and happen to both use ?x to express that requirement.
496
497 >>> f1 = yaml.load('''
498 ... a: 1
499 ... b: 1
500 ... c: ?x
501 ... d: ?x
502 ... ''')
503 >>> f2 = yaml.load('''
504 ... a: ?x
505 ... b: ?x
506 ... c: 2
507 ... d: 2
508 ... ''')
509 >>> bindings1 = {}
510 >>> bindings2 = {}
511 >>> # We could avoid defining two empty dictionaries by simply using the
512 >>> # defaults, with unify(f1, f2) -- but we want to be able to examine
513 >>> # the bindings afterward.
514 >>> print show(unify(f1, f2, bindings1, bindings2))
515 a: 1
516 b: 1
517 c: 2
518 d: 2
519 >>> print bindings1
520 {'x': 2}
521 >>> print bindings2
522 {'x': 1}
523
524 If a variable is unified with another variable, the two variables are
525 _aliased_ to each other; they share the same value, similarly to reentrant
526 feature structures. This is represented in a set of bindings as one
527 variable having the other as its value.
528 >>> f1 = yaml.load('''
529 ... a: ?x
530 ... b: ?x
531 ... ''')
532 >>> f2 = yaml.load('''
533 ... b: ?y
534 ... c: ?y
535 ... ''')
536 >>> bindings = {}
537 >>> print show(unify(f1, f2, bindings))
538 a: &id001 ?y
539 b: *id001
540 c: *id001
541 >>> print bindings
542 {'x': ?y}
543
544 Reusing the same variable bindings ensures that appropriate bindings are
545 made after the fact:
546 >>> bindings = {}
547 >>> f1 = {'a': Variable('x')}
548 >>> f2 = unify(f1, {'a': {}}, bindings)
549 >>> f3 = unify(f2, {'b': Variable('x')}, bindings)
550 >>> print show(f3)
551 a: &id001 {}
552 b: *id001
553 >>> print bindings
554 {'x': {}}
555
556 @param feature1: The first object to be unified.
557 @type feature1: C{object} (probably a mapping)
558 @param feature2: The second object to be unified.
559 @type feature2: C{object} (probably a mapping)
560 @param bindings1: The variable bindings associated with the first object.
561 @type bindings1: C{dict} or None
562 @param bindings2: The variable bindings associated with the second object,
563 if these are distinct from C{bindings1}.
564 @type bindings2: C{dict} or None
565 @return: The result of unifying the two objects.
566 @rtype: C{object} (probably a mapping)
567 """
568 if fail is None:
569 def failerror(f1, f2):
570 raise UnificationFailure
571 fail = failerror
572
573 if bindings1 is None and bindings2 is None:
574 bindings1 = {}
575 bindings2 = {}
576 else:
577 if bindings1 is None: bindings1 = {}
578 if bindings2 is None: bindings2 = bindings1
579
580 if memo is None: memo = {}
581 copymemo = {}
582 if memo.has_key((id(feature1), id(feature2))):
583 result = memo[id(feature1), id(feature2)]
584 if result is UnificationFailure:
585 if trace > 2:
586 print '(cached) Unifying: %r + %r --> [fail]' % (feature1, feature2)
587 raise result()
588 if trace > 2:
589 print '(cached) Unifying: %r + %r --> ' % (feature1, feature2),
590 print repr(result)
591 return result
592
593 if trace > 1:
594 print 'Unifying: %r + %r --> ' % (feature1, feature2),
595
596
597
598
599 copy1 = _copy_and_bind(feature1, bindings1, copymemo)
600 copy2 = _copy_and_bind(feature2, bindings2, copymemo)
601
602 for b in (bindings1, bindings2):
603 for (vname, value) in b.items():
604 value_id = id(value)
605 if value_id in copymemo:
606 b[vname] = copymemo[value_id]
607
608
609 try:
610 unified = _destructively_unify(copy1, copy2, bindings1, bindings2, memo,
611 fail)
612 except UnificationFailure:
613 if trace > 1: print '[fail]'
614 memo[id(feature1), id(feature2)] = UnificationFailure
615 raise
616
617 _apply_forwards_to_bindings(bindings1)
618 _apply_forwards_to_bindings(bindings2)
619 _apply_forwards(unified, {})
620 unified = _lookup_values(unified, {}, remove=False)
621 _lookup_values(bindings1, {}, remove=True)
622 _lookup_values(bindings2, {}, remove=True)
623
624 if trace > 1:
625 print repr(unified)
626 elif trace > 0:
627 print 'Unifying: %r + %r --> %r' % (feature1, feature2, repr(unified))
628
629 memo[id(feature1), id(feature2)] = unified
630 return unified
631
634 """
635 Attempt to unify C{self} and C{other} by modifying them
636 in-place. If the unification succeeds, then C{self} will
637 contain the unified value, and the value of C{other} is
638 undefined. If the unification fails, then a
639 UnificationFailure is raised, and the values of C{self}
640 and C{other} are undefined.
641 """
642 if depth > 50:
643 print "Infinite recursion in this unification:"
644 print show(dict(feature1=feature1, feature2=feature2,
645 bindings1=bindings1, bindings2=bindings2, memo=memo))
646 raise ValueError, "Infinite recursion in unification"
647 if memo.has_key((id(feature1), id(feature2))):
648 result = memo[id(feature1), id(feature2)]
649 if result is UnificationFailure: raise result()
650 unified = _do_unify(feature1, feature2, bindings1, bindings2, memo, fail,
651 depth)
652 memo[id(feature1), id(feature2)] = unified
653 return unified
654
655 -def _do_unify(feature1, feature2, bindings1, bindings2, memo, fail, depth=0):
656 """
657 Do the actual work of _destructively_unify when the result isn't memoized.
658 """
659
660
661 if feature1 is None: return feature2
662 if feature2 is None: return feature1
663 if feature1 is feature2: return feature1
664
665
666 if isinstance(feature1, Variable):
667 if isinstance(feature2, Variable):
668
669
670 return feature1.forwardTo(feature2, bindings1, bindings2)
671 else:
672 feature1.bindValue(feature2, bindings1, bindings2)
673 return feature1
674 if isinstance(feature2, Variable):
675 feature2.bindValue(feature1, bindings2, bindings1)
676 return feature2
677
678
679
680 if not isMapping(feature1):
681 if feature1 == feature2: return feature1
682 else:
683 return fail(feature1, feature2)
684 if not isMapping(feature2):
685 return fail(feature1, feature2)
686
687
688
689
690 while feature2.has_key(_FORWARD): feature2 = feature2[_FORWARD]
691 if feature1 is not feature2: feature2[_FORWARD] = feature1
692 for (fname, val2) in feature2.items():
693 if fname == _FORWARD: continue
694 val1 = feature1.get(fname)
695 feature1[fname] = _destructively_unify(val1, val2, bindings1,
696 bindings2, memo, fail, depth+1)
697 return feature1
698
714
716 """
717 The unification procedure creates _bound variables_, which are Variable
718 objects that have been assigned a value. Bound variables are not useful
719 in the end result, however, so they should be replaced by their values.
720
721 This procedure takes a mapping, which may be a feature structure or a
722 binding dictionary, and replaces bound variables with their values.
723
724 If the dictionary is a binding dictionary, then 'remove' should be set to
725 True. This ensures that unbound, unaliased variables are removed from the
726 dictionary. If the variable name 'x' is mapped to the unbound variable ?x,
727 then, it should be removed. This is not done with features, because a
728 feature named 'x' can of course have a variable ?x as its value.
729 """
730 if isinstance(mapping, Variable):
731
732
733
734
735
736 var = mapping
737 if var.value() is not None:
738 return var.value()
739 else:
740 return var.forwarded_self()
741 if not isMapping(mapping): return mapping
742 if visited.has_key(id(mapping)): return mapping
743 visited[id(mapping)] = True
744
745 for fname, fval in mapping.items():
746 if isMapping(fval):
747 _lookup_values(fval, visited)
748 elif isinstance(fval, Variable):
749 if fval.value() is not None:
750 mapping[fname] = fval.value()
751 if isMapping(mapping[fname]):
752 _lookup_values(mapping[fname], visited)
753 else:
754 newval = fval.forwarded_self()
755 if remove and newval.name() == fname:
756 del mapping[fname]
757 else:
758 mapping[fname] = newval
759 return mapping
760
771
773 "Run unit tests on unification."
774 import doctest
775 doctest.testmod()
776
777 if __name__ == "__main__":
778 test()
779