RRD-alarm

INTRODUCTION INSTALLING HOW IT WORKS EXAMPLES USING RRD-ALARM WITH A MIB

Introduction

One of the risks that can happen in Network Management is that, after having purchased all the useful tools to monitor network, and after having correctly installed and configured the software and hardware, nobody checks periodically if everything is going well (for example, if there are bottlenecks in the network, if the quality of some service is under the expectations, and so on). Instead it will be useful that, when something goes wrong, someone could be informed, so that the right action could be taken. Rrd-alarm was designed for accomplish that: for example, when some fixed threshold is exceeded (network bandwidth, number of connections per server, ping time, etc), which could mean that the quality of the service is poor, then an action is taken so that someone could solve the problem (or at least be aware of that).

RRD-alarm consists of the file rrd_alarm.c, and other configuration files: it was designed to be very simple yet effective. You can use it stand-alone or as a plugin for another application (for example a network application). If you want to check values of an RRD file periodically, you can use crontab to launch it at a regular intervals of time.


Installing

To install RRD-alarm you need to have RRDtool on your computer. You can download it from: http://www.rrdtool.org/
After having extract rrd-alarm, type:
./configure
make
(da aggiustare)

How it works

First thing, you need to have RRDtool on you computer which is a round robin database (hence the name) that keep track of a lot of things you may want to know. For example, if you want to keep under control the number of packets per hour that goes out from your Ethernet interface, you can use RRDtool to store all the values in a file and update it periodically, so you can make statistics using the stored results (for example, you can make graphs showing the amount of packets flowing your interface every hours, and so on).

RRD-alarm uses the files generated by RRDtool to check if the average of the values in a period of time is above or below a certain threshold. You can choose which file to check, in which period of time you want to look for the value (for example, in the last two hours), which kind of threshold (below or above), and the value of the threshold. Moreover, you can choose how many repetitions of values (below or above the threshold) must verify before doing the action, the seconds that must be elapsed before repeting that action (it could be useful when the action is simply to inform someone of what has happened, avoiding to say him the same thing twice), and the kind of action that must be taken when those rules are satisfied.

How to use it

You can use RRD-alarm typing:
rrd_alarm filename [option:parameter]
where filename is the file containing one or more lines of configuration parameters needed by RRD-alarm: the configuration file must have the following syntax:

id file below|above threshold minRepetitions start end action rearm

where:
#id: identifier of the command line
#file: path of the RRD file
#below|above: threshold type
#threshold: threshold value
#minRepetitions: min number of threshold boundaries
#start: begin (time)
#end: end (time)
#action: action to perform if threshold is verified
#rearm: min time (sec) between two consecutive alarms

The #id must be a positive integer; #threshold is a positive value; #start and #end time are values expressed in seconds since 1970-01-01 UTC, or you can use expressions like ``now-1h'' (now less 1 hour), ``now-2d'' (now less 2 days), ``now'', and so on. The action to perform is a command line: for example, you can write in a log file, you can send an e-mail to someone, you can launch a shell script, etc.

Valid option for RRD-alarm are:

Examples

A simple example

Suppose you are using RRDtool to keep track of the percentage of errors of the frames across a WAN connection: when this number is too high (for example, 5%), you may want to be informed, so that you can check the line, call your ISP, etc. The first thing you need to do is to create your database using RRDtool, specifing which kind of counter you want to use, the time between two updates, and so on (look in the documentation of RRDtool). Then you have to update the database, for example every 10 minutes, with the current value of that percentage (you can use a manager to retrieve this information from the agent residing on the device used for the WAN connection, and than using this information to fill the database, or you can find all the informations you need to get that percentage in the /proc/net/ directory, if the ISDN adapter is installed on your computer).

The next step is to use RRD-alarm: you have to prepare a configuration file storing all the information needed by RRD-alarm. This file must contain a line for every threshold you want to check and every line has parameters separated by TAB character. In particular, you have to specify for every line:

So the configuration file looks like this: in this case it consists of a single line, where every parameter is separated by the TAB, and \\ must be ignored:
1       isdn.rrd    above    5    3    now-1h    now \\
echo ``Errors above 5% in ISDN line''   86400
(The meaning of this line is : ``Line 1: if the values contained in the file isdn.rrd are above 5 for at least 3 consecutive times in the last hour (now-1, now), then echo Errors..., unless the same action was done in the last 86400 seconds.'').

If the name used to save this file is alarm.conf, then you can use this file with rrd-alarm simply typing:

rrd_alarm alarm.conf

A complete example

Suppose you want to check the Ethernet connection used by a computer on the LAN, and you want to see when the output bandwidth exceeds 10000 (bytes). First, you must create an RRD file using RRDtool: to do this you can use a shell script like this:
#!/bin/sh

rrdfiles="/home/daniele/rrd-alarm/tables/"
rrdfiles_int="interfaces/"
filename="ipBytes"
rrdtool="/usr/local/rrdtool-1.0.44/bin/rrdtool"

`../rrdtool create ${rrdfiles}${rrdfiles_int}${filename}.rrd \
	     --step 60 \
	     DS:output:COUNTER:100:U:U  \
	     RRA:AVERAGE:0.5:1:24`
This script creates an RRD file ``ipBytes.rrd'' in the directory
/home/daniele/rrd-alarm/tables/interfaces/, and the name of the data source (DS) is ``output'', and the data source type is a COUNTER; 100 seconds is the maximum number of seconds that may pass between two updates of this data source before the value of the data source is assumed to be unknown; U:U means that min and max expected values are unknown; the database stores the AVERAGE of the only value (nothing to average in this case) and are kept 24 samples of that average.

The next step is to update periodically the database: again, you can use a script to do that:

#!/bin/sh

rrdfiles="/home/daniele/rrd-alarm/tables/"
rrdfiles_int="interfaces/"
filename="ipBytes"
rrdtool="/usr/local/rrdtool-1.0.44/bin/rrdtool"

line=`cat /proc/net/dev | /bin/grep eth0 | cut -b8-`
outbytes=`echo $line | /bin/awk '{print $9}'`

`$rrdtool update ${rrdfiles}${rrdfiles_int}${filename}.rrd \
                 N:$outbytes > /dev/null`
This script simply updates the database with the current number of output bytes of the eth0 interface.

Then you have to do the configuration file for RRD-alarm: the configuration file could look like this:

1	/home/daniele/rrd-alarm/tables/interfaces/ipBytes.rrd \\
above	10000	3	now-1h	now 86400\\	
/home/daniele/rrd-alarm/action/mail.sh
(This file contains one line, i.e. ``1 /home/.../ipBytes.rrd above 10000 3 now-1h now /home/.../mail.sh 86400'', so simply skip all of the \\ character).
Line 1 tells RRD-alarm to look in the file ipBytes.rrd for 3 consecutive values in the last hour above 10000; in that case the action to execute is "mail.sh". If the last alarm was sent before 86400 seconds, don't repeat that action. Suppose the name used to save the file is alarm.conf, and that the file mail.sh is a simple script that send a mail to the network administrator:
#!/bin/sh
`/bin/date > /home/daniele/rrd-alarm/action/message.txt`
`/bin/hostname >> /home/daniele/rrd-alarm/action/message.txt`
`mail admin -s "Too many outbut bytes" < \
/home/daniele/rrd-alarm/action/message.txt`
Now you are ready to use RRD-alarm! If you use the option -v, then the output could look like this:
[daniele rrd-alarm]$ rrd_alarm alarm.conf -v

File: /home/daniele/rrd-alarm/tables/interfaces/ipBytes.rrd
Interval:
        Start = Sat Aug 16 01:26:00 2003
        End =  Sat Aug 16 01:31:00 2003
        6 time/s above 10000.00
Interval:
        Start = Sat Aug 16 01:33:00 2003
        End =  Sat Aug 16 01:35:00 2003
        3 time/s above 10000.00
Interval:
        Start = Sat Aug 16 01:37:00 2003
        End =  Sat Aug 16 01:40:00 2003
        4 time/s above 10000.00
In this case, there are three intervals in which the threshold has been exceeded for at least 3 consecutive times: 6 times in the first interval, from Sat Aug 16 01:26:00 2003 to Sat Aug 16 01:31:00 2003, 3 times from Sat Aug 16 01:33:00 2003 to Sat Aug 16 01:35:00 2003, and 4 times from Sat Aug 16 01:37:00 2003 to Sat Aug 16 01:40:00 2003.
Using the option -V, you can see the exact values and the time when the threshold was exceeded.
[daniele rrd-alarm]$ rrd_alarm alarm.conf -V
------------------------------------------
   65473.770492 - Sat Aug 16 01:26:00 2003
   27044.846175 - Sat Aug 16 01:27:00 2003
   60681.688525 - Sat Aug 16 01:28:00 2003
   14004.744809 - Sat Aug 16 01:29:00 2003
   31297.295082 - Sat Aug 16 01:30:00 2003
   36483.104918 - Sat Aug 16 01:31:00 2003
------------------------------------------
   12712.213115 - Sat Aug 16 01:33:00 2003
   11612.833552 - Sat Aug 16 01:34:00 2003
   31477.370000 - Sat Aug 16 01:35:00 2003
------------------------------------------
   42710.704918 - Sat Aug 16 01:37:00 2003
   18260.445082 - Sat Aug 16 01:38:00 2003
   17160.147541 - Sat Aug 16 01:39:00 2003
   16435.152459 - Sat Aug 16 01:40:00 2003
------------------------------------------

File: /home/daniele/rrd-alarm/tables/interfaces/ipBytes.rrd
Interval:
        Start = Sat Aug 16 01:26:00 2003
        End =  Sat Aug 16 01:31:00 2003
        6 time/s above 10000.00
Interval:
        Start = Sat Aug 16 01:33:00 2003
        End =  Sat Aug 16 01:35:00 2003
        3 time/s above 10000.00
Interval:
        Start = Sat Aug 16 01:37:00 2003
        End =  Sat Aug 16 01:40:00 2003
        4 time/s above 10000.00
The last step is to use crontab to update the database at regular intervals of time, and to launch RRD-alarm:
0-59 * * * * /home/daniele/rrd-alarm/update.sh
0 0-23  * * * /home/daniele/rrd-alarm/rrd_alarm \\
/home/daniele/rrd-alarm/conf/alarm.conf -d \\
/home/daniele/rrd-alarm/saved/
This file consists of two lines, so ignore the \\, then save the file (for example, crontab), then type:
crontab crontab
Now, every minute the update.sh script is executed, and every hour RRD-alarm is launched.

Using RRD-alarm with a MIB

If you want to define an MIB-SNMP to use with RRD-alarm, you must define a table containing all the parameters needed by the configuration file. So you can make a table with many rows, each row for every line of the configuaration file, and each column for the parameters.
(da aggiustare)
Daniele Sgandurra: sgandurr@cli.di.unipi.it