HALBLDDB

Section: User Commands (1)
Updated: September 23, 2006
Index Return to Main Contents
 

NAME

halblddb - The first stage in building a HAL database with Emdros  

SYNOPSIS

halblddb [ options ]
 

DESCRIPTION

halblddb is the first stage in building a HAL database with Emdros. It builds a database from an input text which you supply. The text must be in iso-8859-1 encoding.

HAL is a method for measuring proximity and word-frequency in a text. The basic idea is to slide a "sliding window" over the text, then count how many times a given word-form appears within a given distance of the word at the front of the window. This is stored in an m x m matrix, where m is the number of unique word-forms in the text.

For example, if the word A occurs 1 place before word B, and word B is currently at the head of a sliding window of width 5, then the matrix' (B,A) entry will get 4 added to it, since 5-1 = 4. Similarly, if A occurs 2 places before B, then the (B,A) entry will get 3 added to it. The sliding window is "slid" across the text until the text has been exhausted.

halblddb initializes and builds a HAL database from a given text.

Typical usage would be:

halblddb -d mydb -f mytext.txt -o mywordlist.txt

If built to use MySQL or PostgreSQL, then you may need additional options such as "-u username", "-p password" and "-h hostname".

The next step is to run mqlhal(1). See the man page of mqlhal for more information.

 

OPTIONS

halblddb supports the following command-line switches:
--help
show help
-V , --version
show version
-b , --backend backend
set database backend to `backend'. Valid values are: For PostgreSQL: "p", "pg", "postgres", and "postgresql". For MySQL: "m", "my", and "mysql". For SQLite 2.X.X: "s", "l", "lt", "sqlite", and "sqlite2". For SQLite 3.X.X: "3", "s3", "lt3", and "sqlite3".
-d , --dbname dbname
set database name
-f input text filename
set name of input text file
-o wordlist filename
set name of file where the word list will be stored
-h , --host hostname
set db back-end hostname to connect to (default is 'localhost') (has no effect on SQLite)
-u , --user user
set database user to connect as (default is 'emdf') (has no effect on SQLite)
-p , --password password
set password to use for the database user. Has no effect on SQLite, unless you have an encryption-enabled SQLite, in which case this gets passed as the key.

 

RETURN VALUES

0 Success
1 Wrong usage
2 Connection to backend server could not be established
3 An exception occurred (the type is printed on stderr)
4 Could not open file
5 Database error
6 Compiler error (error in MQL input)

 

AUTHORS

Copyright 2005-2006 by Ulrik Petersen (ulrikp@users.sourceforge.net). Note that this software is distributed under the GNU GPL. See the sources for details.

 

REFERENCES

Burgess, C., K. Livesay, and K. Lund (1998). "Explorations in Context Space: Words, Sentences, Discourse", Discourse Processes, Volume 25, pp. 211 - 257.

See <http://locutus.ucr.edu/abstracts/97-bll-expl.html> from where you can also download the paper.

Burgess, C. and K. Lund. (1997). "Modelling parsing constraints with high-dimensional context space." Language and Cognitive Processes, Volume 12, pp. 1-34.

See <http://citeseer.nj.nec.com/context/398051/0>.

Lund, Kevin and Curt Burgess. (1996) "Producing high-dimensional semantic spaces from lexical co-occurrence", Behavior Research Methods, Instruments and Computers, Volume 28, number 2, pp. 203--208.

See http://locutus.ucr.edu/abstracts/96-lb-prod.html from where you can also download the paper.


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
RETURN VALUES
AUTHORS
REFERENCES

This document was created by man2html, using the manual pages.
Time: 22:29:32 GMT, December 01, 2006