WatchFull

Unix Tools for Monitoring File System Capacity and Averting Crises

by John Walker


Introduction

Inside every Unix shop with more than a handful of machines, odds are there's a file system slowly growing toward 100% capacity, at which point all kinds of unpleasant events begin to transpire. System administrators who value the serenity and rejuvenation of a good night's sleep over the flattering feeling of being needed which comes from having your pager go off at four in the morning need tools which anticipate little problems before they mature into full-on screaming crises. This page presents a suite of such tools, all Perl programs, which monitor impending file system full disasters and aid in remediation of impending problems.

WatchFull consists of three independent Perl scripts which address different aspects of file system capacity management. In order to use these tools you must have Perl installed on your system. The programs were tested with Perl v5.22.1 and should work on any version of Perl from 5.003 and later. Note that while Perl has been ported to non-Unix platforms, these utilities are Unix-specific as they require other standard Unix programs not present on other systems. If you install these programs as cron jobs, be sure to verify whether Perl can be found from the abbreviated search path used for such jobs and, if not, either add it to the path or explicitly specify the full pathname in the crontab entry.

WatchFull
Examines mounted file systems and reports, usually by E-mail to the administrator, any which exceed a designated percentage of their capacity.
LogJam
One frequent culprit insidiously filling up critical file systems is log files, such as console message logs and FTP and HTTP transfer logs, which grow without bound as entries are appended. LogJam monitors a list of files and reports any which exceed individually designated threshold sizes in an E-mail to the administrator.
Top40
Unlike WatchFull and LogJam, which are usually run automatically as cron jobs to alert the administrator of incipient problems, Top40 helps resolve them by producing a list of the 40 (or any other number you like) largest files in one or more directory trees. With appropriate options you can scan everything from a single directory to all file systems mounted on a machine.

WatchFullThe Cop on the Beat

Documentation for WatchFull follows, in Unix manual page style.

NAME

WatchFull - monitor file systems approaching capacity

SYNOPSIS

perl WatchFull.pl-d ] [ -g name ] [ -m address ] [ -s /path=thresh,… ] [ -t thresh ] [ -u ] [ -x /path,… ]

DESCRIPTION

WatchFull examines mounted file systems on the machine on which it is run and generates a report listing those which exceed a given capacity threshold. If any file systems are found to exceed their thresholds, a warning is mailed to the designated system administrator.

Here is an example of a warning message mailed by WatchFull to the administrator of a host named “pallas”.

Greetings, carbon-based lifeform.  The following file systems on
pallas are approaching capacity.

/dev/dsk/dks0d2s7       xfs  1960472  1804112   156360  93  /files1

OPTIONS

-d
Debug mode: output (if any) is written to standard output rather than being E-mailed. If you specify this option when running WatchFull as a cron job, the output will usually be mailed to the owner of the job as a cron report.
-g name
Name by which to greet the system administrator. By default this is “carbon-based lifeform”.
-m address
Mail warning messages to the designated address, which can be any E-mail address accessible by the mailer on the host system. This defaults to “root@localhost”.
-s /path=thresh,…
The -s option allows you to specify warning thresholds for individual file systems. The argument is a comma-separated list of file system mount points (as shown in a “df -k” report) and warning thresholds as a percentage of capacity.
-t thresh
The default warning threshold, as a percentage of file system capacity, is set to thresh. This threshold may be overridden for individual file systems by the -s option. The default warning threshold is 90% of the file system's capacity.
-u
Print how-to-call information.
-x /path,…
File systems mounted at the comma-separated list of mount points are excluded from those checked. You may want to exclude NFS-mounted file systems on other machines, read-only media such as CD-ROMs, and removable backup media which are routinely written until full.

BUGS

WatchFull assumes the “df -k” command produces output in the format it expects and that the Mail command can be used to send mail to the designated recipient. If this isn't the case, you'll have to modify the Perl program accordingly. On some systems you'll have to replace Mail with mailx.

SEE ALSO

df(1), Mail(3)

LogJamThe Usual Suspects

One of the most common causes of file system exhaustion is system and server log files which grow without bound as entries are appended to them. If you don't keep an eye on these files, they can eat your disc alive. For example, let's a take a peek at the HTTP log file directory on the www.fourmilab.ch server right now:

/files/server/logs/http> ls -lt
total 5012848
-rw-r--r--    1 root     sys       477730174 Aug 22 15:21 agent_log
-rw-r--r--    1 root     sys       957131968 Aug 22 15:21 referer_log
-rw-r--r--    1 root     sys      1113544113 Aug 22 15:21 access_log
-rw-r--r--    1 root     sys        18168734 Aug 22 15:19 error_log

Yikes! That's two and a half G- G- Gigabytes of log files—time to clean house! (Actually, the file system on which these files are kept has a capacity of 17 Gb and is only about 25% full, so I can go a long time before taking the garbage out….)

Anyway, LogJam will keep an eye on the log files on your system and E-mail warnings when one or more exceed size thresholds you define on a file-by-file basis. Amid the daily chaos of system administration, it's easy to overlook log files ratcheting up to absurd dimensions. LogJam lets you know when they need attending to.

Documentation for LogJam follows, in Unix manual page style.

NAME

LogJam - monitor size of continuously growing files

SYNOPSIS

perl LogJam.pl-d ] [ -g name ] [ -m address ] [ -t ] [ -u ] filename threshold

DESCRIPTION

A common cause of file system full crises are system and server log files which grow without bound as transactions are added. Most modern Unix systems incorporate mechanisms to limit the space consumed by system files such as console message transcripts and login histories, but many server logs such as FTP and HTTP access logs must be manually “cycled” when they grow too large. This is a task easily overlooked amidst the quotidian alarums and diversions of system administration. LogJam keeps an eye on these log files and sends a warning when one or more exceeds a given size threshold.

On the command line, list one or more “filename threshold” pairs which specify a file to be checked and the size threshold which, when exceeded, will generate a warning for that file. The size may be specified in bytes, or with a suffix of “K” for kilobytes, “M” for megabytes, “G” for gigabytes, or “T” for terabytes. Suffixes denote powers of 1000 and may be either upper or lower case. For example, to generate a warning when an HTTP access log exceeds 500 megabytes, one would use:
perl LogJam.pl /files/server/logs/http/access_log 500M

OPTIONS

-d
Debug mode: output (if any) is written to standard output rather than being E-mailed. If you specify this option when running WatchFull as a cron job, the output will usually be mailed to the owner of the job as a cron report.
-g name
Name by which to greet the system administrator. By default this is “carbon-based lifeform”.
-m address
Mail warning messages to the designated address, which can be any E-mail address accessible by the mailer on the host system. This defaults to “root@localhost”.
-t
Print size quantities with thousands separators. The number 1269259614 will be displayed as “1,269,259,614” with this option specified. If you prefer a different character as the thousands separator, change the assignment to the $Thousands variable in the source code.
-u
Print how-to-call information.

BUGS

The size of a file is deemed to be whatever the Perl -s operator says it is. On systems which support and contain “holey” files—files in which all logical addresses do not correspond to allocated storage—the size reported may not correspond to the amount of storage actually occupied by the file.

Sizes of the files named on the command line are determined with the “du -sk” command. If this command does not produce the expected format on your system, you'll have to modify the Perl program to specify the appropriate command and/or parse the results it returns.

LogJam assumes the Mail command can be used to send mail to the designated recipient. If this isn't the case, you'll have to modify the source code accordingly. On some systems you'll have to replace Mail with mailx.

SEE ALSO

du(1), Mail(1)

Top40Most Wanted List

Once WatchFull has alerted you to a file system approaching capacity and you've dealt with any oversized log files fingered by LogJam, it's time to unleash the witch hunt for huge files lurking in less obvious locations. You know—those 275 megabyte core dumps from Netscrape in each of your users' home directories, the fellow with half a gigabyte of, shall we say, “non-work related” MPEG files, system crash core dumps and kernel images dating back to 1994, patch back-out directories from three operating system releases ago, etc. This is where Top40 comes in.

Top40 scans one or more directory trees and prepares a list of the 40 largest files found in them. (You can specify the number of files to be shown with a command line option.) These files are prime candidates for clean-up campaigns.

Documentation for Top40 follows, in Unix manual page style.

NAME

Top40 - show largest files in directory trees

SYNOPSIS

Top40-f ] [ -h ] [ -n count ] [ -s size ] [ -t ] [ -u ] rootdir

DESCRIPTION

Top40 scans one or more directory trees starting at its rootdir and displays a list of the largest files found, 40 by default, in descending order by size. The rootdir arguments need not be file system mount points—any directory may be scanned. If no rootdir is specified, the current directory is scanned.

OPTIONS

-f
By default, Top40 does not follow mount point directories onto other file systems. For example, if you scan the root file system, “/”, a directory “/usr” which is the mount point of a different physical file system will not be examined. The -f option overrides this and causes mount points to be followed just like regular directories. Specifying the -f option and the root file system will cause every file system on the machine to be scanned. This can take a long time!
-h
File sizes are displayed in “human readable” form: a number of three or fewer digits followed by a suffix, “b” for bytes, “K” for kilobytes, “M” for megabytes, “G” for gigabytes, and “T” for terabytes. Each of these units denotes a power of 1000, not 1024. Values less than 10 units are shown with a single decimal place; if you wish to change the decimal character, modify the definition of $Decimal in the program source.
-n count
The count largest files are displayed. The default value is 40.
-s size
Files smaller than the specified size are excluded from the scan. This reduces the time required to scan large file systems with many relatively small files. The size may be given in bytes, or with a suffix of “K” for kilobytes, “M” for megabytes, “G” for gigabytes, and “T” for terabytes. Suffixes denote powers of 1000 and may be either upper or lower case. By default all files are scanned, regardless of size.
-t
File sizes are edited with a thousands separator, for example “16,297,944” instead of “16297944”. If you prefer a different thousands separator character, change the definition of $Thousands in the program source.
-u
Print how-to-call information.

BUGS

The Unix find command is used to traverse the specified directory trees and pre-filter the files found therein. If the find command on your system behaves in a non-standard manner, you may have to modify the options supplied to it in the program source.

Directories which contain a multitude of small files will escape scrutiny by Top40. A separate scan which sums the size of top-level directory contents would be required to identify such perpetrators and Top40 does not presently do this.

The size of a file is taken to be whatever the Perl -s operator says it is. On systems which support and contain “holey” files—files in which all logical addresses do not correspond to allocated storage—the size reported may not correspond to the amount of storage actually occupied by the file.

SEE ALSO

find(1)

Download WatchFull.tar.gz (Gzipped TAR archive)

To install WatchFull, download the archive using the link above, uncompress with gunzip, then extract the contents with tar. The resulting directory will contain Perl source code for each of the components of the package, this document in HTML format, and a Makefile used to create the release archive.

Copying and Support Information

This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided “as is” without express or implied warranty.

Absolutely no support or assistance of any kind whatsoever is available for WatchFull—you are entirely on your own.


by John Walker
August, MM