Plot Files
One of collectl's main features is its ability to generate files in a ready-to-plot format which
is compatible with what gnuplot expects and there are actually 2 main types of files that it
generates. The first, which has an extension of tab, represents a table of all the summary
data. What makes this file unique is that all data elements are in a fixed set of columns -
some columns may get added over time, but for all intents and purposes, the set of data for say CPUs
do not change regardless of how many CPUs are in the system. The second type of files deal with
detail data, the amount of which changes with the number of instances so a 4 CPU system will have 1/2
the data an 8 CPU system has. There is one file for each type of detail data.
Plot files can be generated in 2 ways and each has its own advantages as well as disadvantages.
- Post processing - collectl runs with minimal overhead, writing all its data to raw files.
Those files are then converted to plot files at some time in the future
- During collectl - in this mode, collectl writes directly to plot files eliminating the need
for conversion of raw file
At first glance, it sounds like you'd always want to generate plot files directly since you avoid
the need for the conversion step, but you should also realize a few things about this methodology:
- When plot files are generated directly you no longer have access to the original data. This means
you can't play back the data over selected periods of time nor can you select different data to examine,
for example if you chose to record CPU summary data but later decide you want to see CPU detail data.
- In some cases you may want to look at data in unnormalized form and cannot
- Some data is never converted to plot format and therefore is lost forever
- With raw data you can always see the data in its original format if any questions arise to its accuracy
- You can play back data multiple times, generating different views as well as plot files as often as you like
- You can select different time intervals over which to play back the data
- Playing back raw data allows you to display it in multiple formats such as --export vmstat or use
--top to look at process data in different ways
- Most important, if you really want the best of both worlds you can record data in plot format and
with the use of --rawtoo also record the data in raw form.
Generating Plot Files On-The-Fly
While generating files this way is as easy as appending -P to the collectl command either
when run interactively or in /etc/collectl.conf, there are a couple of things to keep in mind:
- If you want immediate access to the data while collectl is running be sure to always flush
the buffers and don't compress the data by including the switches: -F0 -oz
- Compressed data takes about 90% less storage, so this may be an option too
- Be sure to explicitly list all the subsystems you want plots for. In other words if you want CPU
detail data, be sure to include C with the subsystem selection. If you want both summary and
detail CPU data you'll need cC.
- If you're afraid you'll lose critical data, consider using --rawtoo
Generating Plot Files from RAW Files
Collectl has the capability to play back a single file or multiple once
but in either case
the first thing collectl does is examine the raw file header to get the
source host name and creation date. There will always be a new set of data
generated for each unique combination of host and creation date. Note that
depending on the subsystems chosen there may be multiple output files generated.
This also means a single raw file that spans multiple
dates will result in a single set of data.
By default, the name of the plot file contains only the date and a test is made
to see if a file with that name already exists. If not, it is created in
append mode. This means that multiple raw data files for the same
host on the same date will result in a single set of data. However, if that
file already exists, collectl will NOT process any data, and request you
specify -oc to tell it to perform the first open in create mode
so that subsequent files can be appended. If you specify -oa
all files will be appended to the original one which may not be what you want.
Collectl cannot read your mind so to be safe, be explicit. If you want to
generate a unique set of data files for each raw file use -ou
which causes the time to be included in file names, resulting in a unique output
file name for each raw file.
This certainly maximizes your flexibility for all the reasons listed earlier. However, this now puts
the responsibility of managing your data more squarely on your shoulders. Some of the questions you need
to answer include:
- Do you want to convert the raw files to plot files every day or just when needed?
- Where do you want to store the plot files and how will you get them there?
- Will you automate the file copies/conversion via a cron job or do it manually when needed?
- Should you always convert everything to plot files or just do summary data, only generating detail
data when needed?
- As with on-the-fly generation, should the plot files be compressed or not?
Having answered these questions and perhaps others, it now just becomes a matter of executing
the appropriate copy and/or collectl commands, which can be relatively easily scripted.
TIP - If you rsync raw files to another server and then process
them using a wildcard in your playback command, you will probably end up processing some of today's files too!
If you then later copy over the rest of today's file(s) you will need to recreate today's plot file since
collectl will not overwrite an exiting file by default. But if you specify the -oc switch with a wild
card you will end up recreating all the plot files which will result in a lot more processing
than you were planning on. Collectl supports a special syntax that allows you to playback just the
files from yesterday by replacing that string with yesterday's date as in the following:
collectl -p "YESTERDAY*" etc...
noting that all uppercase characters are required and you can include other characters in the string
such as a host name if need be.
TIP - If you want to create multiple sets of plot files from the same raw file, you can always
include a unique qualifier along with the directory name with the -f switch to give each set a different
prefix.