SLASHEDTEXTIMPORT
Section: User Commands (1)
Updated: August 9, 2006
Index
Return to Main Contents
NAME
slashedtextimport - A tool to convert slashed text to Emdros MQL
SYNOPSIS
slashedtextimport [ options ] [input_filename ...]
DESCRIPTION
slashedtextimport is a command-line tool to convert "slashed
text" to Emdros MQL for later importing into Emdros. "Slashed text"
is here a term used to describe text that: a) has whitespace-separated
words, abd b) has analyses of each word, appended on to the word, and
separated by some substring, usually the character `/'.
slashedtextimport can separate text into paragraphs if
paragraphs are always delimited by the same string. The default
paragraph-delimiter is "\n\n", i.e., two newlines.
OPTIONS
slashedtextimport supports the following command-line switches:
- --help
-
show help, then quit
- -V , --version
-
show version, then quit
- -s , --schema
-
show MQL schema on stdout, then quit (can be used with -d)
- -d , --dbname dbname
-
set database name. If used with -s, the string "CREATE DATABASE
- -o , --output filename
-
dump to file filename. The default is "-", which means "standard
output".
- --start-monad monad
-
The start monad to use. Must be >= 1. The default is 1.
- --start-id_d id_d
-
The start id_d to use. Must be >= 1. The default is 1.
- --word-sep string
-
Use the given string as the word-internal separator. The default
is '/'. All standard C escapes such as '\n' and '\r' can be used,
uncluding '\xYZ' for hexadecimal characters.
- --para-sep string
-
Use the given string as the paragraph-separator. The default is two
newlines ("\n\n"). All standard C escapes such as '\n' and '\r'
can be used, uncluding '\xYZ' for hexadecimal characters.
-
OPERATION
slashedtextimport reads slashed text converts the text to MQL
statements for later importing into Emdros.
The filenames given after the options on the command line are
interpreted as if each of them contains one document, each containing
a whitespace-separated string of words. If no filenames are given,
the input is read from stdin.
If no -o switch is given, the output is printed on stdout.
If an error occurs, the string "FAILURE" or the string "ERROR" is
printed on stderr, along with an error message.
If no error occurs, a string of the form "SUCCESS: next_monad is X
next_id_d is Y" is printed on stderr, where X and Y are positive
integers denoting the next monad and the next id_d to be used by the
next invocation of the program, respectively. This is useful if
you've got several directories' worth of documents to import.
SCHEMA
The schema can be seen by giving the program -s switch, with an
optional -d switch.
A "Document" corresponds to one top-level file.
A "Paragraph" corresponds to a contiguous stretch of words delimited
from other paragraphs by the argument of --para-sep.
A "Word" is a single whitespace-separated "string" in a document,
separated into "surface", "tag", and "lemma" (in that order) by the
"word-internal separator" (`/' by default; see the --word-sep option
above). If the lemma is not present, it is set to the empty string.
If the tag is not present, it is set to the empty string.
RETURN VALUES
- 0 Success
-
- 1 Wrong usage
-
- 2 Connection to backend server could not be established
-
- 3 An exception occurred (the type is printed on stderr)
-
- 4 Could not open file
-
- 5 Database error
-
- 6 Compiler error (internal error)
-
AUTHORS
Copyright
2001-2006 by Ulrik Petersen (ulrikp@users.sourceforge.net). Note that
this software is distributed under the GNU GPL. See the sources for
details.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- OPERATION
-
- SCHEMA
-
- RETURN VALUES
-
- AUTHORS
-
This document was created by
man2html,
using the manual pages.
Time: 22:29:32 GMT, December 01, 2006