1 """
2 This module provides utilities for treating Python dictionaries as X{feature
3 structures}. Specifically, it contains the C{unify} function, which can be used
4 to merge the properties of two dictionaries, and the C{Variable} class, which
5 holds an unknown value to be used in unification.
6
7 A X{feature structure} is a mapping from feature names to feature values,
8 where:
9
10 - Each X{feature name} is a case sensitive string.
11 - Each X{feature value} can be a base value (such as a string), a
12 variable, or a nested feature structure.
13
14 However, feature structures are not a specialized class; they are represented
15 by dictionaries, or more generally by anything that responds to the C{has_key}
16 method. The YAML representation can be used to create and display feature
17 structures intuitively:
18
19 >>> f1 = yaml.load('''
20 ... A:
21 ... B: b
22 ... D: d
23 ... ''')
24 >>> f2 = yaml.load('''
25 ... A:
26 ... C: c
27 ... D: d
28 ... ''')
29 >>> print show(unify(f1, f2))
30 A:
31 B: b
32 C: c
33 D: d
34
35 Feature structures are typically used to represent partial information
36 about objects. A feature name that is not mapped to a value stands
37 for a feature whose value is unknown (I{not} a feature without a
38 value). Two feature structures that represent (potentially
39 overlapping) information about the same object can be combined by
40 X{unification}. When two inconsistant feature structures are unified,
41 the unification fails and raises an error.
42
43 Features can be specified using X{feature paths}, or tuples of feature names
44 that specify paths through the nested feature structures to a value.
45
46 Feature structures may contain reentrant feature values. A
47 X{reentrant feature value} is a single feature value that can be
48 accessed via multiple feature paths. Unification preserves the
49 reentrance relations imposed by both of the unified feature
50 structures. After unification, any extensions to a reentrant feature
51 value will be visible using any of its feature paths.
52
53 Feature structure variables are encoded using the L{Variable} class. The scope
54 of a variable is determined by the X{bindings} used when the structure
55 including that variable is unified. Bindings can be reused between unifications
56 to ensure that variables with the same name get the same value.
57
58 """
59
60 from copy import copy, deepcopy
61 import re
62 import yaml
63
64 import sys
65
67 """
68 An exception that is raised when two values cannot be unified.
69 """
70 pass
71
73 """
74 Determine whether to treat a given object as a feature structure. The
75 test is whether it responds to C{has_key}. This can be overridden if the
76 object includes an attribute or method called C{_no_feature}.
77
78 @param obj: The object to be tested
79 @type obj: C{object}
80 @return: True iff the object can be treated as a feature structure
81 @rtype: C{bool}
82 """
83 return ('has_key' in dir(obj)) and ('_no_feature' not in dir(obj))
84
86 """
87 _FORWARD is a singleton value, used in unification as a flag that a value
88 has been forwarded to another object.
89
90 This class itself is used as the singleton value. It cannot be
91 instantiated.
92 """
94 raise TypeError, "The _FORWARD class is not meant to be instantiated"
95
97 """
98 A Variable is an object that can be used in unification to hold an
99 initially unknown value. Two equivalent Variables, for example, can be used
100 to require that two features have the same value.
101
102 When a Variable is assigned a value, it will eventually be replaced by
103 that value. However, in order to make that value show up everywhere the
104 variable appears, the Variable temporarily stores its assigned value and
105 becomes a I{bound variable}. Bound variables do not appear in the results
106 of unification.
107
108 Variables are distinguished by their name, and by the dictionary of
109 I{bindings} that is being used to determine their values. Two variables can
110 have the same name but be associated with two different binding
111 dictionaries: those variables are not equal.
112 """
113 _next_numbered_id = 1
114
115 - def __init__(self, name=None, value=None):
116 """
117 Construct a new feature structure variable.
118
119 The value should be left at its default of None; it is only used
120 internally to copy bound variables.
121
122 @type name: C{string}
123 @param name: An identifier for this variable. Two C{Variable} objects
124 with the same name will be given the same value in a given dictionary
125 of bindings.
126 """
127 self._uid = Variable._next_numbered_id
128 Variable._next_numbered_id += 1
129 if name is None: name = self._uid
130 self._name = str(name)
131 self._value = value
132
134 """
135 @return: This variable's name.
136 @rtype: C{string}
137 """
138 return self._name
139
141 """
142 If this varable is bound, find its value. If it is unbound or aliased
143 to an unbound variable, returns None.
144
145 @return: The value of this variable, if any.
146 @rtype: C{object}
147 """
148 if isinstance(self._value, Variable): return self._value.value()
149 else: return self._value
151 """
152 @return: A copy of this variable.
153 @rtype: C{Variable}
154 """
155 return Variable(self.name(), self.value())
156
158 """
159 Variables are aliased to other variables by one variable _forwarding_
160 to the other. The first variable simply has the second as its value,
161 but it acts like the second variable's _value_ is its value.
162
163 forwarded_self returns the final Variable object that actually stores
164 the value.
165
166 @return: The C{Variable} responsible for storing this variable's value.
167 @rtype: C{Variable}
168 """
169 if isinstance(self._value, Variable):
170 return self._value.forwarded_self()
171 else: return self
172
173 - def bindValue(self, value, ourbindings, otherbindings):
174 """
175 Bind this variable to a value. C{ourbindings} are the bindings that
176 accompany the feature structure this variable came from;
177 C{otherbindings} are the bindings from the structure it's being unified
178 with.
179
180 @type value: C{object}
181 @param value: The value to be assigned.
182 @type ourbindings: C{dict}
183 @param ourbindings: The bindings associated with this variable.
184 @type otherbindings: C{dict}
185 @param otherbindings: The bindings associated with the value being
186 assigned. (May be identical to C{ourbindings}.)
187 """
188 if isinstance(self._value, Variable):
189
190 return self._value.bindValue(value, ourbindings, otherbindings)
191 if self._value is None:
192
193 self._value = value
194 else:
195
196
197 self._value = unify(self._value, value, ourbindings, otherbindings)
198
199 - def forwardTo(self, other, ourbindings, otherbindings):
200 """
201 A unification wants this variable to be aliased to another variable.
202 Forward this variable to the other one, and return the other.
203
204 @type other: C{Variable}
205 @param other: The variable to replace this one.
206 @type ourbindings: C{dict}
207 @param ourbindings: The bindings associated with this variable.
208 @type otherbindings: C{dict}
209 @param otherbindings: The bindings associated with the other variable.
210 (May be identical to C{ourbindings}.)
211 @return: C{other}
212 @rtype: C{Variable}
213 """
214 other.bindValue(self.value(), ourbindings, otherbindings)
215 self._value = other
216 return other
217
218 - def __hash__(self): return hash(self._uid)
220 """
221 Variables are equal if they are the same object or forward to the
222 same object. Variables with the same name may still be unequal.
223 """
224 if not isinstance(other, Variable): return -1
225 if isinstance(self._value, Variable): return cmp(self._value, other)
226 else: return cmp(self._uid, other._uid)
228 if self._value is None: return '?%s' % self._name
229 else: return '?%s: %r' % (self._name, self._value)
230
232 """
233 Works like yaml.dump(), but with output suited for doctests. Flow style
234 is always off, and there is no blank line at the end.
235 """
236 return yaml.dump(data, default_flow_style=False).strip()
237
239 "Output variables in YAML as ?name."
240 return dumper.represent_scalar(u'!var', u'?%s' % var.name())
241 yaml.add_representer(Variable, variable_representer)
242
248 yaml.add_constructor(u'!var', variable_constructor)
249 yaml.add_implicit_resolver(u'!var', re.compile(r'^\?\w+$'))
250
252 """
253 Make a deep copy of a feature structure, preserving reentrance using the
254 C{memo} dictionary. Meanwhile, variables are replaced by their bound
255 values, if these values are already known, and variables with unknown
256 values are given placeholder bindings.
257 """
258 if memo is None: memo = {}
259 if id(feature) in memo: return memo[id(feature)]
260 if isinstance(feature, Variable) and bindings is not None:
261 if not bindings.has_key(feature.name()):
262 bindings[feature.name()] = feature.copy()
263 result = _copy_and_bind(bindings[feature.name()], None, memo)
264 else:
265 if isMapping(feature):
266
267 result = feature.__class__()
268 for (key, value) in feature.items():
269 result[key] = _copy_and_bind(value, bindings, memo)
270 else: result = feature
271 memo[id(feature)] = result
272 memo[id(result)] = result
273 return result
274
275 -def unify(feature1, feature2, bindings1=None, bindings2=None, fail=None):
276 """
277 In general, the 'unify' procedure takes two values, and either returns a
278 value that provides the information provided by both values, or fails if
279 that is impossible.
280
281 These values can have any type, but fall into a few general cases:
282 - Values that respond to C{has_key} represent feature structures. The
283 C{unify} procedure will recurse into them and unify their inner values.
284 - L{Variable}s represent an unknown value, and are handled specially.
285 The values assigned to variables are tracked using X{bindings}.
286 - C{None} represents the absence of information.
287 - Any other value is considered a X{base value}. Base values are
288 compared to each other with the == operation.
289
290 The value 'None' represents the absence of any information. It specifies no
291 properties and acts as the identity in unification.
292 >>> unify(3, None)
293 3
294
295 >>> unify(None, 'fish')
296 'fish'
297
298 A base value unifies with itself, but not much else.
299 >>> unify(True, True)
300 True
301
302 >>> unify([1], [1])
303 [1]
304
305 >>> unify('a', 'b')
306 Traceback (most recent call last):
307 ...
308 UnificationFailure
309
310 When two mappings (representing feature structures, and usually implemented
311 as dictionaries) are unified, any chain of keys that accesses a value in
312 either mapping will access an equivalent or more specific value in the
313 unified mapping. If this is not possible, UnificationFailure is raised.
314
315 >>> f1 = dict(A=dict(B='b'))
316 >>> f2 = dict(A=dict(C='c'))
317 >>> unify(f1, f2) == dict(A=dict(B='b', C='c'))
318 True
319
320 The empty dictionary specifies no features. It unifies with any mapping.
321 >>> unify({}, dict(foo='bar'))
322 {'foo': 'bar'}
323
324 >>> unify({}, True)
325 Traceback (most recent call last):
326 ...
327 UnificationFailure
328
329 Representing dictionaries in YAML form is useful for making feature
330 structures readable:
331
332 >>> f1 = yaml.load("number: singular")
333 >>> f2 = yaml.load("person: 3")
334 >>> print show(unify(f1, f2))
335 number: singular
336 person: 3
337
338 >>> f1 = yaml.load('''
339 ... A:
340 ... B: b
341 ... D: d
342 ... ''')
343 >>> f2 = yaml.load('''
344 ... A:
345 ... C: c
346 ... D: d
347 ... ''')
348 >>> print show(unify(f1, f2))
349 A:
350 B: b
351 C: c
352 D: d
353
354 Variables are names for unknown values. Variables are assigned values
355 that will make unification succeed. The values of variables can be reused
356 in later unifications if you provide a dictionary of _bindings_ from
357 variables to their values.
358 >>> bindings = {}
359 >>> print unify(Variable('x'), 5, bindings)
360 5
361
362 >>> print bindings
363 {'x': 5}
364
365 >>> print unify({'a': Variable('x')}, {}, bindings)
366 {'a': 5}
367
368 The same variable name can be reused in different binding dictionaries
369 without collision. In some cases, you may want to provide two separate
370 binding dictionaries to C{unify} -- one for each feature structure, so
371 their variables do not collide.
372
373 In the following examples, two different feature structures use the
374 variable ?x to require that two values are equal. The values assigned to
375 ?x are consistent within each structure, but would be inconsistent if every
376 ?x had to have the same value.
377
378 >>> f1 = yaml.load('''
379 ... a: 1
380 ... b: 1
381 ... c: ?x
382 ... d: ?x
383 ... ''')
384 >>> f2 = yaml.load('''
385 ... a: ?x
386 ... b: ?x
387 ... c: 2
388 ... d: 2
389 ... ''')
390 >>> bindings1 = {}
391 >>> bindings2 = {}
392 >>> print show(unify(f1, f2, bindings1, bindings2))
393 a: 1
394 b: 1
395 c: 2
396 d: 2
397
398 >>> print bindings1
399 {'x': 2}
400
401 >>> print bindings2
402 {'x': 1}
403
404 Feature structures can involve _reentrant_ values, where multiple feature
405 paths lead to the same value. This is represented by the features having
406 the same Python object as a value. (This kind of identity can be tested
407 using the C{is} operator.)
408
409 Unification preserves the properties of reentrance. So if a reentrant value
410 is changed by unification, it is changed everywhere it occurs, and it is
411 still reentrant. Reentrant features can even form cycles; these
412 cycles can now be printed through the current YAML library.
413
414 >>> f1 = yaml.load('''
415 ... A: &1 # &1 defines a reference in YAML...
416 ... B: b
417 ... E:
418 ... F: *1 # and *1 uses the previously defined reference.
419 ... ''')
420 >>> f1['E']['F']['B']
421 'b'
422 >>> f1['A'] is f1['E']['F']
423 True
424 >>> f2 = yaml.load('''
425 ... A:
426 ... C: c
427 ... E:
428 ... F:
429 ... D: d
430 ... ''')
431 >>> f3 = unify(f1, f2)
432 >>> print show(f3)
433 A: &id001
434 B: b
435 C: c
436 D: d
437 E:
438 F: *id001
439 >>> f3['A'] is f3['E']['F'] # Showing that the reentrance still holds.
440 True
441
442 This unification creates a cycle:
443 >>> f1 = yaml.load('''
444 ... F: &1 {}
445 ... G: *1
446 ... ''')
447 >>> f2 = yaml.load('''
448 ... F:
449 ... H: &2 {}
450 ... G: *2
451 ... ''')
452 >>> f3 = unify(f1, f2)
453 >>> print f3
454 {'G': {'H': {...}}, 'F': {'H': {...}}}
455 >>> print f3['F'] is f3['G']
456 True
457 >>> print f3['F'] is f3['G']['H']
458 True
459 >>> print f3['F'] is f3['G']['H']['H']
460 True
461
462 A cycle can also be created using variables instead of reentrance.
463 Here we supply a single set of bindings, so that it is used on both sides
464 of the unification, making ?x mean the same thing in both feature
465 structures.
466
467 >>> f1 = yaml.load('''
468 ... F:
469 ... H: ?x
470 ... ''')
471 >>> f2 = yaml.load('''
472 ... F: ?x
473 ... ''')
474 >>> f3 = unify(f1, f2, {})
475 >>> print f3
476 {'F': {'H': {...}}}
477 >>> print f3['F'] is f3['F']['H']
478 True
479 >>> print f3['F'] is f3['F']['H']['H']
480 True
481
482 Two sets of bindings can be provided because the variable names on each
483 side of the unification may be unrelated. An example involves unifying the
484 following two structures, which each require that two values are
485 equivalent, and happen to both use ?x to express that requirement.
486
487 >>> f1 = yaml.load('''
488 ... a: 1
489 ... b: 1
490 ... c: ?x
491 ... d: ?x
492 ... ''')
493 >>> f2 = yaml.load('''
494 ... a: ?x
495 ... b: ?x
496 ... c: 2
497 ... d: 2
498 ... ''')
499 >>> bindings1 = {}
500 >>> bindings2 = {}
501 >>> # We could avoid defining two empty dictionaries by simply using the
502 >>> # defaults, with unify(f1, f2) -- but we want to be able to examine
503 >>> # the bindings afterward.
504 >>> print show(unify(f1, f2, bindings1, bindings2))
505 a: 1
506 b: 1
507 c: 2
508 d: 2
509 >>> print bindings1
510 {'x': 2}
511 >>> print bindings2
512 {'x': 1}
513
514 If a variable is unified with another variable, the two variables are
515 _aliased_ to each other; they share the same value, similarly to reentrant
516 feature structures. This is represented in a set of bindings as one
517 variable having the other as its value.
518 >>> f1 = yaml.load('''
519 ... a: ?x
520 ... b: ?x
521 ... ''')
522 >>> f2 = yaml.load('''
523 ... b: ?y
524 ... c: ?y
525 ... ''')
526 >>> bindings = {}
527 >>> print show(unify(f1, f2, bindings))
528 a: &id001 ?y
529 b: *id001
530 c: *id001
531 >>> print bindings
532 {'x': ?y}
533
534 Reusing the same variable bindings ensures that appropriate bindings are
535 made after the fact:
536 >>> bindings = {}
537 >>> f1 = {'a': Variable('x')}
538 >>> f2 = unify(f1, {'a': {}}, bindings)
539 >>> f3 = unify(f2, {'b': Variable('x')}, bindings)
540 >>> print show(f3)
541 a: &id001 {}
542 b: *id001
543 >>> print bindings
544 {'x': {}}
545
546 @param feature1: The first object to be unified.
547 @type feature1: C{object} (probably a mapping)
548 @param feature2: The second object to be unified.
549 @type feature2: C{object} (probably a mapping)
550 @param bindings1: The variable bindings associated with the first object.
551 @type bindings1: C{dict} or None
552 @param bindings2: The variable bindings associated with the second object,
553 if these are distinct from C{bindings1}.
554 @type bindings2: C{dict} or None
555 @return: The result of unifying the two objects.
556 @rtype: C{object} (probably a mapping)
557 """
558
559 if fail is None:
560 def failerror(f1, f2):
561 raise UnificationFailure
562 fail = failerror
563
564 if bindings1 is None and bindings2 is None:
565 bindings1 = {}
566 bindings2 = {}
567 else:
568 if bindings1 is None: bindings1 = {}
569 if bindings2 is None: bindings2 = bindings1
570
571
572
573
574 copymemo = {}
575 copy1 = _copy_and_bind(feature1, bindings1, copymemo)
576 copy2 = _copy_and_bind(feature2, bindings2, copymemo)
577
578 for b in (bindings1, bindings2):
579 for (vname, value) in b.items():
580 value_id = id(value)
581 if value_id in copymemo:
582 b[vname] = copymemo[value_id]
583
584
585 unified = _destructively_unify(copy1, copy2, bindings1, bindings2, {}, fail)
586
587 _apply_forwards_to_bindings(bindings1)
588 _apply_forwards_to_bindings(bindings2)
589 _apply_forwards(unified, {})
590 unified = _lookup_values(unified, {}, remove=False)
591 _lookup_values(bindings1, {}, remove=True)
592 _lookup_values(bindings2, {}, remove=True)
593
594 return unified
595
597 """
598 Attempt to unify C{self} and C{other} by modifying them
599 in-place. If the unification succeeds, then C{self} will
600 contain the unified value, and the value of C{other} is
601 undefined. If the unification fails, then a
602 UnificationFailure is raised, and the values of C{self}
603 and C{other} are undefined.
604 """
605 if memo.has_key((id(feature1), id(feature2))):
606 return memo[id(feature1), id(feature2)]
607 unified = _do_unify(feature1, feature2, bindings1, bindings2, memo, fail)
608 memo[id(feature1), id(feature2)] = unified
609 return unified
610
611 -def _do_unify(feature1, feature2, bindings1, bindings2, memo, fail):
612 """
613 Do the actual work of _destructively_unify when the result isn't memoized.
614 """
615
616
617 if feature1 is None: return feature2
618 if feature2 is None: return feature1
619 if feature1 is feature2: return feature1
620
621
622 if isinstance(feature1, Variable):
623 if isinstance(feature2, Variable):
624
625
626 return feature1.forwardTo(feature2, bindings1, bindings2)
627 else:
628 feature1.bindValue(feature2, bindings1, bindings2)
629 return feature1
630 if isinstance(feature2, Variable):
631 feature2.bindValue(feature1, bindings2, bindings1)
632 return feature2
633
634
635
636 if not isMapping(feature1):
637 if feature1 == feature2: return feature1
638 else:
639 return fail(feature1, feature2)
640 if not isMapping(feature2):
641 return fail(feature1, feature2)
642
643
644
645
646 while feature2.has_key(_FORWARD): feature2 = feature2[_FORWARD]
647 feature2[_FORWARD] = feature1
648 for (fname, val2) in feature2.items():
649 if fname == _FORWARD: continue
650 val1 = feature1.get(fname)
651 feature1[fname] = _destructively_unify(val1, val2, bindings1,
652 bindings2, memo, fail)
653 return feature1
654
670
672 """
673 The unification procedure creates _bound variables_, which are Variable
674 objects that have been assigned a value. Bound variables are not useful
675 in the end result, however, so they should be replaced by their values.
676
677 This procedure takes a mapping, which may be a feature structure or a
678 binding dictionary, and replaces bound variables with their values.
679
680 If the dictionary is a binding dictionary, then 'remove' should be set to
681 True. This ensures that unbound, unaliased variables are removed from the
682 dictionary. If the variable name 'x' is mapped to the unbound variable ?x,
683 then, it should be removed. This is not done with features, because a
684 feature named 'x' can of course have a variable ?x as its value.
685 """
686 if isinstance(mapping, Variable):
687
688
689
690
691
692 var = mapping
693 if var.value() is not None:
694 return var.value()
695 else:
696 return var.forwarded_self()
697 if not isMapping(mapping): return mapping
698 if visited.has_key(id(mapping)): return mapping
699 visited[id(mapping)] = True
700
701 for fname, fval in mapping.items():
702 if isMapping(fval):
703 _lookup_values(fval, visited)
704 elif isinstance(fval, Variable):
705 if fval.value() is not None:
706 mapping[fname] = fval.value()
707 if isMapping(mapping[fname]):
708 _lookup_values(mapping[fname], visited)
709 else:
710 newval = fval.forwarded_self()
711 if remove and newval.name() == fname:
712 del mapping[fname]
713 else:
714 mapping[fname] = newval
715 return mapping
716
727
729 "Run unit tests on unification."
730 import doctest
731 doctest.testmod()
732
733 if __name__ == "__main__":
734 test()
735