Apache module: bwshare
documentation for developers

work in progress

/*-----------------------------------------------------------------------------
Copyright (C) 2000, 2001, 2002, Alan Kennington.
You may distribute this software under the terms of Alan Kennington's
modified Artistic Licence, as specified in the accompanying LICENCE file.
-----------------------------------------------------------------------------*/
This software package is open and free.
Please read the LICENCE file for precise details.

DISCLAIMER.
The author of this software disclaims any express or implied guarantee of the
fitness of this software for any purpose. In no event shall the author of this
software be held liable for any direct, indirect, incidental, special,
exemplary, or consequential damages (including, but not limited to, procurement
of substitute services; loss of use, data, or profits; or business interruption)
however caused and on any theory of liability, whether in contract, strict
liability, or tort (including negligence or otherwise) arising in any way out of
the use of this software, even if advised of the possibility of such damage.

the trace display

In the bwshare-trace display, if there's an exclamation mark `!' to the left of a reverse DNS translation, that means that the double-reverse lookup failed.
In other words, it means that the reverse DNS provided for the client host IP is wrong - it doesn't match the forward name lookup.
However, when this happens, the reverse DNS is still used (usually) in the Apache log file.
This feature allows you to see what reverse DNS is claimed for a client IP address even if it is wrong, so that you can identify an address in the Apache log file.

garbage disposal

The Apache module `bwshare' has a `garbage disposal' algorithm which is really an information compaction or weeding algorithm.
What happens is that there is a table with N_REMOTE_HOSTS entries to record the historical behaviour of each remote host which accesses the server.
When this table is full, some information must be thrown away to make room for new information.
Otherwise, new remote hosts could not be recorded and throttled.
Because of the way shared memory works, it is convenient to use a fixed table size for remote hosts.

When `garbage disposal' occurs, the least important information is cleared from the table until a predefined tide-mark is reached.
Many arbitrary decisions have been made in the algorithm design, including various tide-marks and especially the function which determines how `interesting' each remote host record is.
Host records are sorted in order of importance, and the least important are discarded.

The garbage disposal occurs in the function bws_garbage_disposal().
The first thing that happens is that the semaphore for the shared memory is grabbed so that other child processes of the Apache parent process cannot read or write the shared memory.
This is by far the longest-duration resource lock in the bwshare module.
Therefore this function is written fairly efficiently to avoid a performance bottleneck.

The high-water marks for the remote host table are called hwm_hard and hwm_soft.
The low water-marks are called lwm_hard and lwm_soft.
The defaults are hwm_hard = N_REMOTE_HOSTS - 1; hwm_soft = 0.75 * N_REMOTE_HOSTS; lwm_hard = hwm_soft - 1; lwm_soft = 0.5 * N_REMOTE_HOSTS.
Checks are performed to ensure that
lwm_soft <= lwm_hard < hwm_soft <= hwm_hard < N_REMOTE_HOSTS.

Next, the number of occupied hsot records n_host_records is calculated.
If this is less than the low water mark, then the garbage disposal is finished.
This should never happen because bws_garbage_disposal should only be called when some high water mark is reached.

Next, all host records are evaluated for their `interest' or importance.
This evaluation is done in a temporary array `hosts_sort' of data structures of type `host_sort_t'.
The sorting-array records look like this.

typedef struct host_sort_t {
    uint16              host_index; /* Array index of this host record. */
    uint16              keep;       /* Indication of whether to keep it. */
    int32               value;      /* How valuable the record is to keep. */
    };
The field `host_index' is the offset of the remote host record in the array of host records.
The field `keep' is an integer which indicates how important the host record is.
The field `value' is a secondary measure of importance.
Records are sort primarily in increasing order of the field `keep', and secondarily by increasing order of the field `value'.
The host record evaluation formula is the dodgiest aspect of the garbage disposal algorithm.
The evaluation algorithm makes use of the following host record information
  • The current TX1 debt of the host. (Number of files in debt.)
  • The current TX2 debt of the host. (Number of bytes in debt.)
  • The peak past TX1 debt of the host.
  • The peak past TX2 debt of the host.
  • The total number of files requrested by the host.
  • The total number of bytes downloaded by the host.
  • The idle time of the host. (The time sinced the last request.)
The value of `keep' is 5 (the maximum value of `keep') if there is no host record at all in a particular position in the remote host array.
This is because you can't delete a host record which is not even present.
The `keep' value is 0 if and only if the TX1 and TX2 debts are currently zero, the peak TX1 and TX2 debts do not exceed the specified bounds for these debts, the total numbers of requests and downloaded files do not exceed the specified warning level, and the idle time is greater than a specified maximum idle time for display of records.
For the default colours in the monitor screen, this means that keep=0 if and only if there are no yellow or red boxes for the host, the last access is not recent, and the host's debt is currently zero.
The host records which are evaluated to have keep=0 are very uninteresting indeed, and are therefore deleted before all others.

All host records with any current debt at all are given keep=4.
Host records with no current debt, but with a red-level peak debt, are given keep=3.
Host records a yellow-level warning for bytes or files, but with no curent debt or red-level peak debt, are given keep=2.
Host records with idle time less than a specified time, but with no yellow-level warning, no red-level peak debt, and no current debt, are given keep=1.
All remaining host records are given keep=0.

The value of `value' is the sum of various debt, idle-time and other factors.

Now the host records are all sorted according into increasing order of importance, primarily by `keep' and secondarily by `value'.

Then host records are successively deleted from the least to the most important until enough have been deleted.
Let N = number of records with `keep' >= 2.
Then the number of records remaining is max(lwm_soft, min(lwm_hard, N)).
In other words, the number of remaining records is N clipped to the interval [lwm_soft,lwm_hard].
This implies that a fair effort is made to keep all keep>=2 records, but only if the number of host records can be kept within lwm_hard.
No matter how uninteresting the records are, lwm_soft of them are always kept.


Go to bwshare Apache module main web page.
Go to some notes on Apache server software.
Go to Alan Kennington's home page.