Stats Re-Averaging Utility Help


NOTE! You can only re-average one interface at a time. I'm very very sorry about that, but it confuses it. Perhaps in a later version....

This is the other page where you can trim your database size; but unlike the deletion page, this utility leaves some of the entries intact, and it makes a compressed backup of what it removes so you can change your mind later. It works by simply removing extra entries in your database, entries which occur more frequently than you would like. Because the entries stored are raw counters and not already averages, all that's necessary to re-average them over a longer time period is to simply remove the extras... thus thinning out your database and reducing its disk space usage. Mind you, though; this will result in slightly less-accurate data; there's no way around that. If I just averaged all the averages together that were within one unit of the requested output sample rate, you'd get a much lower average; misleading... just about everyone's billing for bandwidth is done using the data for one month averaged into 5-minute periods. If you re-average a month to, say, 1-hour periods, the maximum reported transfer rate goes way down and it just gets sloppy. But if you need disk space more than you need the accuracy, it's a handy way to trim some space out of the really old, never-looked-at-anymore stuff. You could also just remove all the old stuff, with the Stats Deletion Utility, leaving its accuracy completely intact should you need to restore it.

And please, remember: the data in the database is raw counters that just keep getting bigger and bigger (assuming it doesn't reboot) as time goes on. If you remove fifteen entries between two times that are five minutes apart, all you've done is remove the intricate detail of how it got from the one time to the other. The end result is still precisely as accurate in terms of aggregate totals; the only exception to this is when the machine resets its counters. Then, you have no way of knowing how high the counter got before it reset, and you'll lose data... but only on that one time interval. It's unavoidable in any event, no matter how frequent your collection interval. So! Now that we've gotten the statistics lesson over with...

Some things to keep in mind while using this script. Because the backed-up statistics are written to disk as a compressed stream, the archive will never take up more space than the finished compressed file. You also don't have to decompress it to disk to restore it. And every entry is written to the backup before it is deleted from the database. Oh, and data is stored as full SQL commands, including the column names, so it should always be capable of restoring your data even if the "stats" table's format changes.

Oh. Compressed files only typically take up between 10% and 15% of the space that the uncompressed file would take up... a much more efficient means of storing old data that you might need once next year sometime.

Just click on the radio button next to the interface you want to re-average the values in over the time period you enter below. Then enter the From and To times between which you want to trim the collected stats on each interface you selected above, following these guidelines:

The Time Amount fields are used to select the interval you want to leave between entries. Let's say you have entries collected every 1 minute and you want to re-average them to every 5 minutes. In that case you're in luck, because that's how the controls come pre-set. If you only want one entry every day, enter "1" in the text box and select "days" in the dropdown list. This will leave just one entry for every day, effectively re-averaging your data accordingly.

Then submit the form. You can always do this safely, because the next page will simply show you a summary of what would have been done as a result of your request; you then have the option of going ahead with it or aborting back to the re-averaging page. And going through with the re-averaging is safe too, because a backup file containing the necessary SQL statements to recreate the deleted entries will be created in your "archive" directory, compressed to reduce its space usage, and you can keep it around just in case you need more detailed statistics for some time in the distant past.

To restore one of the archives (should you suddenly need more-detailed statistics for some point in the distant past), you have to do it from the command line on the machine where your MySQL server is running. Go into the "archive" directory, locate the file you want to restore using the header information in it (you can use "zcat <filename> | head -12" to see which hostname, community, and interface this file was created for), and then issue this command to restore it:
zcat <filename> | mysql stats
(this assumes your database is named "stats" of course). You might want to delete the archive after you restore it (and verify the restore worked), because it'll be useless after you restore it.


Main Help Index