Package nltk_lite :: Package chunk :: Module regexp :: Class Regexp
[hide private]
[frames] | no frames]

Class Regexp

source code

     object --+        
              |        
   parse.ParseI --+    
                  |    
        ChunkParseI --+
                      |
     object --+       |
              |       |
   parse.ParseI --+   |
                  |   |
parse.AbstractParse --+
                      |
                     Regexp


A grammar based chunk parser.  C{chunk.Regexp} uses a set of
regular expression patterns to specify the behavior of the parser.
The chunking of the text is encoded using a C{ChunkString}, and
each rule acts by modifying the chunking in the C{ChunkString}.
The rules are all implemented using regular expression matching
and substitution.

A grammar contains one or more clauses in the following form:

NP:
  {<DT|JJ>}          # chunk determiners and adjectives
  }<[\.VI].*>+{      # chink any tag beginning with V, I, or .
  <.*>}{<DT>         # split a chunk at a determiner
  <DT|JJ>{}<NN.*>    # merge chunk ending with det/adj with one starting with a noun

The patterns of a clause are executed in order.  An earlier
pattern may introduce a chunk boundary that prevents a later
pattern from executing.  Sometimes an individual pattern will
match on multiple, overlapping extents of the input.  As with
regular expression substitution more generally, the chunker will
identify the first match possible, then continue looking for matches
after this one has ended.

The clauses of a grammar are also executed in order.  A cascaded
chunk parser is one having more than one clause.  The maximum depth
of a parse tree created by this chunk parser is the same as the
number of clauses in the grammar.

When tracing is turned on, the comment portion of a line is displayed
each time the corresponding pattern is applied.

@type _start: C{string}
@ivar _start: The start symbol of the grammar (the root node of resulting trees)
@type _stages: C{int}
@ivar _stages: The list of parsing stages corresponding to the grammar
    

Instance Methods [hide private]
 
__init__(self, grammar, top_node='S', loop=1, trace=0)
Create a new chunk parser, from the given start state and set of chunk patterns.
source code
Tree
parse(self, chunk_struct, trace=None)
Apply the chunk parser to this input.
source code
string
__repr__(self)
Returns: a concise string representation of this chunk.Regexp.
source code
string
__str__(self)
Returns: a verbose string representation of this RegexpChunk.
source code

Inherited from parse.AbstractParse: get_parse, get_parse_list, grammar

Inherited from parse.ParseI: get_parse_dict, get_parse_probs

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, grammar, top_node='S', loop=1, trace=0)
(Constructor)

source code 

Create a new chunk parser, from the given start state and set of chunk patterns.

Parameters:
  • grammar (list of string) - The list of patterns that defines the grammar
  • top_node (string or Nonterminal) - The top node of the tree being created
  • loop (int) - The number of times to run through the patterns
  • trace (int) - The level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or higher will generate verbose tracing output.
Overrides: parse.AbstractParse.__init__

parse(self, chunk_struct, trace=None)

source code 

Apply the chunk parser to this input.

Parameters:
  • chunk_struct (Tree) - the chunk structure to be (further) chunked (this tree is modified, and is also returned)
  • trace (int) - The level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or highter will generate verbose tracing output. This value overrides the trace level value that was given to the constructor.
Returns: Tree
the chunked output.
Overrides: ChunkParseI.parse

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
a concise string representation of this chunk.Regexp.
Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 

str(x)

Returns: string
a verbose string representation of this RegexpChunk.
Overrides: object.__str__