Package nltk_lite :: Package corpora :: Module ycoe
[hide private]
[frames] | no frames]

Module ycoe

source code


Reads tokens from the York-Toronto-Helsinki Parsed Corpus of 
Old English Prose (YCOE), a 1.5 million word syntactically-
annotated corpus of Old English prose texts. The corpus is
distributed by the Oxford Text Archive: http://www.ota.ahds.ac.uk/

The YCOE corpus is divided into 100 files, each representing
an Old English prose text. Tags used within each text complies
to the YCOE standard: http://www-users.york.ac.uk/~lang22/YCOE/YcoeHome.htm 

Output of the reader is as follows:

Raw:
['+D+atte',
  'on',
  'o+dre',
  'wisan',
  'sint',
  'to',
  'manianne',
  '+da',
  'unge+dyldegan',
  ',',
  '&',
  'on',
  'o+dre',
  '+da',
  'ge+dyldegan',
  '.']

Tagged:
[('+D+atte', 'C'),
  ('on', 'P'),
  ('o+dre', 'ADJ'),
  ('wisan', 'N'),
  ('sint', 'BEPI'),
  ('to', 'TO'),
  ('manianne', 'VB^D'),
  ('+da', 'D^N'),
  ('unge+dyldegan', 'ADJ^N'),
  (',', ','),
  ('&', 'CONJ'),
  ('on', 'P'),
  ('o+dre', 'ADJ'),
  ('+da', 'D^N'),
  ('ge+dyldegan', 'ADJ^N'),
  ('.', '.')]

Bracket Parse:
(CP-THT: (C: '+D+atte') (IP-SUB: (IP-SUB-0: (PP: (P: 'on') (NP: (ADJ: 'o+dre') (N: 'wisan'))) 
(BEPI: 'sint') (IP-INF: (TO: 'to') (VB^D: 'manianne') (NP: '*-1')) (NP-NOM-1: (D^N: '+da') 
(ADJ^N: 'unge+dyldegan'))) (,: ',') (CONJP: (CONJ: '&') (IPX-SUB-CON=0: (PP: (P: 'on') 
(NP: (ADJ: 'o+dre'))) (NP-NOM: (D^N: '+da') (ADJ^N: 'ge+dyldegan'))))) (.: '.')),

Chunk Parse:
[(S: 
    ('C', '+D+atte') 
    (PP: ('P', 'on') ('ADJ', 'o+dre') ('N', 'wisan')) 
    ('BEPI', 'sint') ('TO', 'to') ('VB^D', 'manianne') 
    (NP: ('NP', '*-1')) ('D^N', '+da') ('ADJ^N', 'unge+dyldegan') (',', ',') ('CONJ', '&') 
    (PP: ('P', 'on') ('ADJ', 'o+dre')) ('D^N', '+da') ('ADJ^N', 'ge+dyldegan') ('.', '.'))]

Functions [hide private]
 
_read(files, conversion_function) source code
 
raw(files=['coprefcura.o2', 'cosolsat2', 'coprefsolilo', 'comarvel.o23',...) source code
 
tagged(files=['coprefcura.o2', 'cosolsat2', 'coprefsolilo', 'comarvel.o23',...) source code
 
chunked(files=['coprefcura.o2', 'cosolsat2', 'coprefsolilo', 'comarvel.o23',..., chunk_types=('NP'), top_node='S', partial_match=True, collapse_partials=True, cascade=True) source code
 
bracket_parse(files=['coprefcura.o2', 'cosolsat2', 'coprefsolilo', 'comarvel.o23',...) source code
 
_parse(s) source code
 
_strip_spaces(s) source code
 
_chunk_parse(files, chunk_types, top_node, partial_match, collapse_partials, cascade) source code
 
demo() source code
Variables [hide private]
  item_name = {'coadrian.o34': 'Adrian and Ritheus', 'coaelhom.o...
  items = ['coprefcura.o2', 'cosolsat2', 'coprefsolilo', 'comarv...
Reads files from a given list, and converts them via the conversion_function.
Variables Details [hide private]

item_name

Value:
{'coadrian.o34': 'Adrian and Ritheus',
 'coaelhom.o3': '\xc6lfric, Supplemental Homilies',
 'coaelive.o3': '\xc6lfrics Lives of Saints',
 'coalcuin': 'Alcuin De virtutibus et vitiis',
 'coalex.o23': 'Alexanders Letter to Aristotle',
 'coapollo.o3': 'Apollonius of Tyre',
 'coaugust': 'Augustine',
 'cobede.o2': 'Bedes History of the English Church',
...

items

Reads files from a given list, and converts them via the conversion_function. Can return raw or tagged read files.

Value:
['coprefcura.o2',
 'cosolsat2',
 'coprefsolilo',
 'comarvel.o23',
 'cochdrul',
 'coalex.o23',
 'colawwllad.o4',
 'cocathom1.o3',
...