Any editor may be used to construct or modify a table, since it is a regular UNIX file, and this 'direct editing' method is occasionally used, especially for small amounts of data. However, avoid using an editor that destroys TAB characters.
To generate a new table the best plan (and usually the safest one) is to first generate a template file, then convert it to table format and add the rows of data. Any convenient editor may be used to generate a template file. To convert it to a table the command 'nosql maketable' may be used, which will produce an empty table. Next use the operator 'nosql edit' to edit in rows of data.
A typical template file is shown below:
# These are lines of table documentation. They can be of any length,
# and any number of such lines may exist.
# Each line must start correctly, e.g with "#", surrounded by any
# number of spaces and/or tabs.
Name Name of item
Type Type: 1,2,3,7,9,A,B,X
Count Number of items
K Constant modifier
SS7 Special status for type 7
Size In Kilobytes
The above template file contains the necessary elements to describe a table of six columns: table documentation (the comment lines that each start with a sharp sign '#'), column name ("Name", "Type", "Count", ...), and column documentation for each column (the text at the end of each column line).
To build the final table header, use the command:
nosql maketable < table.tpl
where table.tpl
is the template file described above.
The command will produce the correct table header to STDOUT,
(that can be redirected to a file as usual) :
Name Type Count K SS7 Size
---- ---- ----- - --- ----
I will now explain how NoSQL tables can be modified, both manually and with automated programs.
Basically there are two ways to manually modify an existing table: use either the 'nosql edit' or the 'nosql merge' commands. See the relevant sections, edit and merge.
The operator 'nosql edit' can be used to add new rows, change existing rows, or delete existing rows of data in an table. To modify a table 'nosql edit' first converts it to 'list' format, which is much more comfortable to work on, then opens an editor on it (vi by default, or whatever is set in the environment variable EDITOR).
After editing, the table is automatically checked for validity by 'edit', using 'nosql istable'. If structure errors are detected, the program prompts you for what to do (re-edit, quit, etc.).
The 'nosql merge' process actually involves other operators, like 'nosql search' and 'nosql edit', and works only when the existing table is sorted on one or more columns (which is a fairly common case). The process includes selecting rows from an existing sorted table (using 'nosql search') into a small table which is easy to edit (using 'nosql edit') and then combining the two tables again (using 'nosql merge'). Since 'nosql edit' is used modifications may include changes, additions, or deletions of rows. Also note that 'nosql merge' keeps the final table in sort order.
The difference is that 'nosql search' is much faster than 'nosql row' or 'nosql edit', the editing is done on a table of conveniently small size, and that the 'nosql merge' operation can be done in the background. Remember that whether one uses 'nosql merge' or 'nosql edit', putting the data back together after editing requires the entire original table to be passed, which can take some time if the original table is large.
The basic method is:
nosql cat table | ... some_other_stuff ... | nosql write -s -o table
for instance:
nosql cat table | nosql row 'Name != "Goofy"' | nosql write -s -o table
will delete from table table
all entries where the Name
column is equal to "Goofy" and write the resulting data back to
table
. Please refer to the documentation of the 'write'
operator in this same manual for more information.
One thing to point out here is that the only way to update a table, both manually and with a program, is to capture the SDTOUT stream of an operator and write the results back to a table file with 'write'. At the moment NoSQL does not provide a record-level update facility. This may be inefficient if you need to do frequent changes to a very large, indexed table, for instance with a CGI program behind a Web server, as the table indices must be updated after each editing operation. But don't be disappointed by this :-). There are ways to circumvent this apparently major limitation of the NoSQL paradigm.
The obvious "trick" is to try and keep your tables small. Silly a suggestion as it may sound, before laughing at it please have a look at section 2.9 of the included 4gl.ps or 4gl.txt paper. Remember that NoSQL works with UNIX, not in addition to it. The underlying UNIX file system, that most commercial databases tend to disregard, if used creatively can provide an extremely powerful way of pre-organizing your tables (relations) efficiently and keep them small (where small means roughly within a few hundred kilobytes in size).
If you really must do frequent modifications to a big indexed table, then you can still do it efficiently by applying your changes to a separate file rather than to the actual table, and merging the changes back into the big table (as well as re-build its indices) only every now and again, with a batch job that can be run in the background, overnight or when the system activity is low. The following example will try to explain this better.
Suppose we have the large indexed table bigtable
and
we need to insert/change/delete one or more records. What we can
do is:
bigtable.j
, with exactly the same header as
bigtable
, but containing only those records that we want
to insert into, remove from or append to bigtable
.
The entries in bigtable.j
will have to be in a format
suitable for the 'nosql merge' operator, see sect.
merge.bigtable
we will have to
do it in three logical steps. The first step is to use 'search'
on bigtable
to take advantage of the indices, see sect.
search. This will
produce an intermediate output that will then have to be merged
into bigtable.j
, and the final output will undergo the
original query statement again.bigtable
will rather be done
to bigtable.j
, with the sintax described in the
documentation of the 'merge' operator. You will also have to
make sure that bigtable.j
is kept sorted on its primary
key field after each update (if you write to bigtable.j using
the nosql 'write' operator, this is done by default).This may seem complicated, but it isn't really. Say bigtable
contains two columns, SSN (for Social Security No.) and SURNAME, where
the unique
table key is on SSN, and we have built a secondary index on the SURNAME
field. If we want to extract all the rows which SURNAME field is
equal to "Smith", the query necessary to take advantage of a fast
search method on bigtable
and to account for the
presence of bigtable.j
is:
echo -e "SURNAME\nSmith" | nosql search -ind bigtable.x.SURNAME | \
nosql merge SSN bigtable.j | \
nosql row 'SURNAME=="Smith"'
As you can see, the trick is:
bigtable
to quickly
obtain a much smaller (possibly empty) subset of bigtable
.bigtable.j
on-the-fly during the query.As shown, these logical steps can be performed in one single line of UNIX shell code :-)
Note 1: for simplicity, in the example I have omitted to lock
the tables before doing the changes and to unlock them afterwards, but
it should always be done,
especially in a multi-user environment. I personally would lock
bigtable
rather than bigtable.j
, as logically
we are updating the former. Of course, for the semaphore to be effective,
we will have to make sure that all the programs abide by the same
locking convention, whatever we choose it to be. See section
Data access control for more on
the NoSQL table locking protocol.
Note 2: make sure that bigtable
(and therefore also
bigtable.j
) have got a primary, unique key field, as
otherwise is may be difficult to get the 'merge' operator to
work properly.
The need to concatenate tables comes up every so often and although it is simple to do it may not be obvious. The UNIX 'cat' command can not be used as it would result in duplicating the header and thus make an invalid table. And of course, only tables with the same header should be concatenated, otherwise an invalid table would result (in this case it could be a gross inconsistency if the number of columns were different). If we have two tables, TABA and TABB, then to concatenate TABB onto the end of TABA we use the command:
nosql body < TABB >> TABA
Note that this avoids duplicating the header.
Note also that the operator 'nosql merge' may be used to merge two like tables based on a key of one or more columns. In this case however the two tables must be sorted on the key.