HALBLDDB
Section: User Commands (1)
Updated: September 23, 2006
Index
Return to Main Contents
NAME
halblddb - The first stage in building a HAL database with Emdros
SYNOPSIS
halblddb [ options ]
DESCRIPTION
halblddb is the first stage in building a HAL database with
Emdros. It builds a database from an input text which you supply.
The text must be in iso-8859-1 encoding.
HAL is a method for measuring proximity and word-frequency in a text.
The basic idea is to slide a "sliding window" over the text, then
count how many times a given word-form appears within a given distance
of the word at the front of the window. This is stored in an m x m
matrix, where m is the number of unique word-forms in the text.
For example, if the word A occurs 1 place before word B, and word B is
currently at the head of a sliding window of width 5, then the matrix'
(B,A) entry will get 4 added to it, since 5-1 = 4. Similarly, if A
occurs 2 places before B, then the (B,A) entry will get 3 added to it.
The sliding window is "slid" across the text until the text has been
exhausted.
halblddb initializes and builds a HAL database from a given
text.
Typical usage would be:
halblddb -d mydb -f mytext.txt -o mywordlist.txt
If built to use MySQL or PostgreSQL, then you may need additional
options such as "-u username", "-p password" and "-h hostname".
The next step is to run mqlhal(1). See the man page of mqlhal for
more information.
OPTIONS
halblddb supports the following command-line switches:
- --help
-
show help
- -V , --version
-
show version
- -b , --backend backend
-
set database backend to `backend'. Valid values are: For PostgreSQL:
"p", "pg", "postgres", and "postgresql". For MySQL: "m", "my", and
"mysql". For SQLite 2.X.X: "s", "l", "lt", "sqlite", and
"sqlite2". For SQLite 3.X.X: "3", "s3", "lt3", and "sqlite3".
- -d , --dbname dbname
-
set database name
- -f input text filename
-
set name of input text file
- -o wordlist filename
-
set name of file where the word list will be stored
- -h , --host hostname
-
set db back-end hostname to connect to (default is 'localhost') (has no effect on SQLite)
- -u , --user user
-
set database user to connect as (default is 'emdf') (has no effect on SQLite)
- -p , --password password
-
set password to use for the database user. Has no effect on SQLite,
unless you have an encryption-enabled SQLite, in which case this gets
passed as the key.
RETURN VALUES
- 0 Success
-
- 1 Wrong usage
-
- 2 Connection to backend server could not be established
-
- 3 An exception occurred (the type is printed on stderr)
-
- 4 Could not open file
-
- 5 Database error
-
- 6 Compiler error (error in MQL input)
-
AUTHORS
Copyright
2005-2006 by Ulrik Petersen (ulrikp@users.sourceforge.net). Note that
this software is distributed under the GNU GPL. See the sources for
details.
REFERENCES
Burgess, C., K. Livesay, and K. Lund (1998). "Explorations in Context
Space: Words, Sentences, Discourse", Discourse
Processes, Volume 25, pp. 211 - 257.
See <http://locutus.ucr.edu/abstracts/97-bll-expl.html> from where you
can also download the paper.
Burgess, C. and K. Lund. (1997). "Modelling parsing constraints with
high-dimensional context space." Language and Cognitive Processes,
Volume 12, pp. 1-34.
See <http://citeseer.nj.nec.com/context/398051/0>.
Lund, Kevin and Curt Burgess. (1996) "Producing
high-dimensional semantic spaces from lexical co-occurrence",
Behavior Research Methods, Instruments and Computers, Volume 28,
number 2, pp. 203--208.
See http://locutus.ucr.edu/abstracts/96-lb-prod.html from where you
can also download the paper.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- RETURN VALUES
-
- AUTHORS
-
- REFERENCES
-
This document was created by
man2html,
using the manual pages.
Time: 22:29:32 GMT, December 01, 2006