Package nltk_lite :: Package parse
[hide private]
[frames] | no frames]

Source Code for Package nltk_lite.parse

  1  # Natural Language Toolkit: Parsers 
  2  # 
  3  # Copyright (C) 2001-2007 University of Pennsylvania 
  4  # Author: Steven Bird <sb@csse.unimelb.edu.au> 
  5  #         Edward Loper <edloper@gradient.cis.upenn.edu> 
  6  # URL: <http://nltk.sf.net> 
  7  # For license information, see LICENSE.TXT 
  8  # 
  9   
 10  """ 
 11  Classes and interfaces for producing tree structures that represent 
 12  the internal organization of a text.  This task is known as X{parsing} 
 13  the text, and the resulting tree structures are called the text's 
 14  X{parses}.  Typically, the text is a single sentence, and the tree 
 15  structure represents the syntactic structure of the sentence. 
 16  However, parsers can also be used in other domains.  For example, 
 17  parsers can be used to derive the morphological structure of the 
 18  morphemes that make up a word, or to derive the discourse structure 
 19  for a set of utterances. 
 20   
 21  Sometimes, a single piece of text can be represented by more than one 
 22  tree structure.  Texts represented by more than one tree structure are 
 23  called X{ambiguous} texts.  Note that there are actually two ways in 
 24  which a text can be ambiguous: 
 25   
 26      - The text has multiple correct parses. 
 27      - There is not enough information to decide which of several 
 28        candidate parses is correct. 
 29   
 30  However, the parser module does I{not} distinguish these two types of 
 31  ambiguity. 
 32   
 33  The parser module defines C{ParseI}, a standard interface for parsing 
 34  texts; and two simple implementations of that interface, 
 35  C{ShiftReduce} and C{RecursiveDescent}.  It also contains 
 36  three sub-modules for specialized kinds of parsing: 
 37   
 38    - C{nltk.parser.chart} defines chart parsing, which uses dynamic 
 39      programming to efficiently parse texts. 
 40    - C{nltk.parser.probabilistic} defines probabilistic parsing, which 
 41      associates a probability with each parse. 
 42  """ 
 43   
 44   
 45  ##////////////////////////////////////////////////////// 
 46  ##  Parser Interface 
 47  ##////////////////////////////////////////////////////// 
48 -class ParseI(object):
49 """ 50 A processing class for deriving trees that represent possible 51 structures for a sequence of tokens. These tree structures are 52 known as X{parses}. Typically, parsers are used to derive syntax 53 trees for sentences. But parsers can also be used to derive other 54 kinds of tree structure, such as morphological trees and discourse 55 structures. 56 57 """
58 - def parse(self, sent):
59 """ 60 Derive a parse tree that represents the structure of the given 61 sentences words, and return a Tree. If no parse is found, 62 then output C{None}. If multiple parses are found, then 63 output the best parse. 64 65 The parsed trees derive a structure for the subtokens, but do 66 not modify them. In particular, the leaves of the subtree 67 should be equal to the list of subtokens. 68 69 @param sent: The sentence to be parsed 70 @type sent: L{list} of L{string} 71 """ 72 raise NotImplementedError()
73
74 - def get_parse(self, sent):
75 """ 76 @return: A parse tree that represents the structure of the 77 sentence. If no parse is found, then return C{None}. 78 79 @rtype: L{Tree} 80 @param sent: The sentence to be parsed 81 @type sent: L{list} of L{string} 82 """
83
84 - def get_parse_list(self, sent):
85 """ 86 @return: A list of the parse trees for the sentence. When possible, 87 this list should be sorted from most likely to least likely. 88 89 @rtype: C{list} of L{Tree} 90 @param sent: The sentence to be parsed 91 @type sent: L{list} of L{string} 92 """
93
94 - def get_parse_probs(self, sent):
95 """ 96 @return: A probability distribution over the parse trees for the sentence. 97 98 @rtype: L{ProbDistI} 99 @param sent: The sentence to be parsed 100 @type sent: L{list} of L{string} 101 """
102
103 - def get_parse_dict(self, sent):
104 """ 105 @return: A dictionary mapping from the parse trees for the 106 sentence to numeric scores. 107 108 @rtype: C{dict} 109 @param sent: The sentence to be parsed 110 @type sent: L{list} of L{string} 111 """
112 113 ##////////////////////////////////////////////////////// 114 ## Abstract Base Class for Parsers 115 ##//////////////////////////////////////////////////////
116 -class AbstractParse(ParseI):
117 """ 118 An abstract base class for parsers. C{AbstractParse} provides 119 a default implementation for: 120 121 - L{parse} (based on C{get_parse}) 122 - L{get_parse_list} (based on C{get_parse}) 123 - L{get_parse} (based on C{get_parse_list}) 124 125 Note that subclasses must override either C{get_parse} or 126 C{get_parse_list} (or both), to avoid infinite recursion. 127 """
128 - def __init__(self):
129 """ 130 Construct a new parser. 131 """ 132 # Make sure we're not directly instantiated: 133 if self.__class__ == AbstractParse: 134 raise AssertionError, "Abstract classes can't be instantiated"
135
136 - def parse(self, tokens):
137 return self.get_parse(list(tokens))
138
139 - def grammar(self):
140 return self._grammar
141
142 - def get_parse(self, tokens):
143 trees = self.get_parse_list(list(tokens)) 144 if len(trees) == 0: return None 145 else: return trees[0]
146
147 - def get_parse_list(self, tokens):
148 tree = self.get_parse(tokens) 149 if tree is None: return [] 150 else: return [tree]
151 152 from cfg import * 153 from tree import * 154 from category import * 155 from chart import * 156 from featurechart import * 157 from treetransforms import * 158 from pcfg import * 159 from sr import * 160 from rd import * 161 from pchart import * 162 from viterbi import * 163