The purpose of this list is to track all the things we think need doing,
and hopefully match them up to people who will do them. Perhaps
you? If you see a task you are interested in, or want to
add tasks to the list or comment on it, contact
Alan
Robertson.
Activity |
Description |
DependsOn |
WhoDo? |
PartCluster |
Detect and perform basic recovery from a partitioned cluster condition.
Of course, this won't unscramble shared SCSI filesystems that might have
occured as a result of a partitioned cluster :-) |
|
|
CMFrame |
Create framework for "real" cluster manager. This constitutes
the APIs and supporting code allowing a cluster manager to be written |
NPhase |
|
NPhase |
Create an n-phase commit protocol similar to IBM's Phoenix cluster
services. Pages 424-430 in "In
Search of Clusters". See especially pages 428 and 429. |
|
|
CM1 |
Create the first cluster manager. A translation of the current
methodology into a cluster manager structure. May be a throwaway. |
CMFrame, HBProtocol (before release) |
|
CM2 |
Creat the first real cluster manager. Must support an
arbitary number of nodes. Probably a voting/quorum-based cluster
manager. |
CM1 |
|
InputCheck |
Verify and Validate system configuration rigorously before starting
up. Provide a standalone configuration validation tool or input checking
mode for heartbeat. |
|
|
SecKeeps |
Optionally allow the secondary host to keep the resources it has when
the primary comes back online. |
|
|
Restart heartbeat processes |
Heartbeat should be able to restart its processes that die.
This is intended to allow for the possibility that one day a bug
might be found in the code which would cause it to die.
A little infrastructure work to support this effort is in 0.4.3.
Heavens! Perish the thought! :-)
|
|
|
HBProtocol |
Fix the protocol so that lost packets are retransmitted,
not just discovered. |
|
alanr |
syslog-rsc |
Make the cluster-wide syslog a cluster resource.
This may require a little thought to make it reliable, and keep messages
from getting lost during transitions.
Maybe have each message logged to two hosts?
|
SysLog |
|
buffers |
Should inspect code and modify to eliminate the possibility of
buffer overrun attacks. This is especially true of the
messaging code.
|
|
|
patchdoc |
Should document my expectations for patch submission.
This should include a little bit about coding style.
|
|
|
manpage |
Write wonderful man pages for heartbeat, heartbeat.cf and haresources
|
|
Shawn McKenzie (?) |
Reconfig |
Should be able to update configurations without shutting down the
cluster and restarting it.
This could be accomplished in lots of different ways. From a local kill -1 type
approach, to a global synchronized cluster restart.
|
|
alanr (available in 0.4.5) |
TrimScripts |
Should trim the number of scripts and how much they control.
Scripts are fine, but heartbeat has a few too many...
A few were trimmed for 0.4.3.
In particular, the heartbeat process should instigate
resource acquisition, on startup, and relinquishment on
shutdown, and not rely on the scripts that call it to
do that. It's probably OK to use scripts in the process,
but starting heartbeat by itself should cause resources
to be acquired, and killing it should cause them to be
given up.
|
|
alanr (available in 0.4.5) |
SysLog |
change ha_log() functions to use syslog. |
|
Guenther Thomsen
(Available in 0.4.5) |
debugoutput |
Heartbeat debuginfo doesn't go into debug-log |
|
Guenther Thomsen
(Available in 0.4.5) |
HBSec |
Authenticate intracluster packets for heartbeat, etc. This
provides both error checking and security. Could simply add a auth_packet
entry point, and then we can plug our favorite authentication method into
it, and even allow the method to be chosen in the configuration file.
This would allow people with secure networks to use crc without encryption,
people in a more hostile environment could use HMAC-SHA1 with a
shared secret, etc.
Current thinking is to model the key file and authentication after
the methods used by NTP. |
|
Mitja Sarp,
Neal McBurnett (consulting)
(Will be available in 0.4.5) |
DEBUG |
Make SIGUSR1 increment debug level, and SIGUSR2 decrement it.
|
|
alanr DONE. 0.4.3. |
memloss |
Memory leaks are a danger since messages are allocated dynamically.
There are a few things to do about it: 1) track buffer usage and make
stats appear in the log from time to time.
2) Document how to properly use and dispose of messages in the code.
Use SIGUSR1 or SIGUSR2 to get message allocation stats.
|
|
alanr DONE. 0.4.3. |
FHS |
Make Linux-HA file placement conform to the Linux Filesystem Hierarchy
Standard. |
|
alanr DONE. 0.4.3. |