Chapter 8. Third Party Applications

Table of Contents
Doc-Format E-Texts
MoneyManager
XWord

Doc-Format E-Texts

Originally implemented by Rick Bram in 1996, the Doc format has become the de facto standard for electronic text distribution in the Palm Computing platform user community. At last count, the Doc format was supported by at least ten different readers for the handheld, and a plethora of conversion tools for the desktop. Thousands of documents, from reference works to original novels, are available in the Doc format.

The Doc format is conceptually very simple: a Doc database is basically just a compressed text file, broken up into records. (A detailed description of the format is provided on the Pyrite web site, at the URL < http://purl.oclc.org/net/n9mtb/cq/doc/format.html >. Pyrite provides access to Doc databases in not only the normal manner, but also as text streams.

Doc databases contain three types of record, organized by position in the database. There is a single header record, followed by a number of text records, followed by zero or more bookmark records. When reading a Doc database, the App.Doc.Database class takes care of returning an appropriate class for each record read from the database. When writing a Doc database, care must be taken to write the proper types of records in the proper order.

Header Records

Doc header records are of the App.Doc.HeaderRecord class. The header of the current database read automatically when you open the database, and is thereafter available in the header attribute of the Doc Database object. It contains the following fields:

version (integer)

The Doc format version which this database conforms to. At present, there are two known versions: a version 1 database stores text records uncompressed, and a version 2 database compresses them.

storylen (integer)

The length of the document text, before compression.

recsize (integer)

The maximum text record size of this document, normally 4096. In a freshly created document, all text records (except the last one) will be of the maximum size. Documents that have been edited may, however, contain shorter records.

Important: Documents with record sizes other than 4096 (especially sizes larger than 4096) may not be supported by all Doc-compatible software.

spare, spare2 (integer)

Usage unknown; filled with zero in new documents, but may contain a value in existing documents.

Note: The document header actually contains more information, including a list of the exact length of each text record, and an indicator of the current viewing position. Newly created documents do not normally contain this information; it is added by the reader the first time the document is opened. Support for these additional fields will be added in a later release of Pyrite.

Text Records

Each text record contains a single field, text, which is a string no longer than the maximum record length specified in the header. There are actually two classes of text record: the App.Doc.TextRecord class is used in version 2 documents, and stores the text in compressed form, while the App.Doc.TextRecordV1 class is used in version 1 documents and stores the text as-is.

Bookmark Records

Each bookmark record has two fields: text, the name of the bookmark (up to 15 characters long), and pos, the (integer) position in the document at which the bookmark is set. The position is calculated based on the uncompressed text of the document, and is relative to the beginning of the text.

Streamed Access to Doc Databases

Because Doc databases are actually just reformatted and compressed text files, Pyrite includes special support for treating them as streams of text. Pyrite provides two classes, App.Doc.DOCReader and App.Doc.DOCWriter, which allow Doc databases to be read and written as if they were ordinary Python files.

The DOCReader Class

The DOCReader class has the following methods and attributes:

__init__ (database)

When creating a DOCReader object, you must pass the name of a Database object which you have previously opened for reading.

read (nbytes=0)

Per standard Python behavior, this method reads "until no more data is available". Because the DOCReader class maintains a one-record buffer, read returns at most the contents of one record.

readline (), readlines (), close (), seek (pos,whence=0), tell ()

These methods follow standard Python semantics for readable files.

The DOCWriter Class

The DOCWriter class allows writing to a Doc database as if it were a text file. Because of the way Doc databases are structured, the DOCWriter object caches the entire contents of the document in memory, and writes it out in one pass at the end. While this should be no problem on a typical system, you should keep it in mind if you plan to run Pyrite on a platform with limited memory.

__init__ (title, target, compress=1, category=0, creator='REAd', type='TEXt', backup=0, version=0)

Unlike DOCReader objects, a DOCWriter object is not attached to an existing database. Instead, it creates a database when it begins writing data. The compress parameter specifies whether the document should be compressed, and the title, creator, type, backup, and version parameters map directly into the corresponding parts of the database header.

Note: The version parameter is not the document format version, but the desired value for the version field of the database header. Some Doc software uses this field as a way of specifing sub-formats of the standard Doc format.

The target parameter tells the object where to put the completed database. It can be a filename, a Database object, or a DLP object.

Important: The semantics of the target parameter may change in the near future, to accommodate the new Store API.

write (data)

Write the specified string into the document.

writelines (list)

Write a list of lines into the document. This method doesn't insert newlines; it simply calls self.write on each element of the list.

bookmark (title, pos=None)

Set a bookmark in the document. If pos is not specified, sets the bookmark at the current position. Bookmarks are stored, and shown in the reader's menu, in the order they are set without regard to their actual position in the document.

set_appinfo (data)

Set the AppInfo block of the document. The data parameter is just a string, not an object, since there is no standard AppInfo format for Doc databases around which to build a class.