"Yikes! Who ate my hard drive?", you exclaim, having installed a Linux distribution update and discovered your 4 Gb root partition, created a year ago that absurdly large (you thought) so you'd never have to worry about it filling up, is now 98% full of stuff you've never heard of and are unlikely to ever use: Pig Latin spelling checkers, six different ways to type Kanji on your ASCII keyboard, and a different desktop environment for every day of the week.

Unless you're ready to surrender, repartition your drive and reload everything in the forlorn hope that the next update won't fill up whatever ridiculous size you make the new root partition, there's nothing for it but to embark upon an old-fashioned system administrator's witch hunt--find out what's using all the space and get rid of the fat targets you don't need. If your system uses RPM software packages, the obvious way to proceed is to make a list of all packages sorted by the disc space they consume, then investigate whether the larger ones are really necessary. While several RPM administration tools such as rpm GnoRPM show the size of individual packages, you're forced to query them individually to find the big ones. This is a lot of typing or pointing and clicking for grizzled sysadmins like me who would rather just look at a list.

This page presents two small, simple Perl programs: rpmsize.pl calculates the size in bytes of all the files in a given RPM package, and rpmhogs.pl uses rpmsize to prepare a list of all packages installed on a system and the size of each, sorted in descending order. Both programs use the command line rpm program to query the database of installed software. As these programs perform only query operations, they do not require super-user (root) privilege to run.


To determine the disc space occupied by a given RPM package, use:

    	perl rpmsize.pl package_name

If the specified package is installed on the system, the sum of the sizes of all files within the package will printed in bytes. For example:

    	$ perl rpmsize.pl emacs-21.2-2

shows that the Emacs text editor and its support files consume 35.8 megabytes on your system. (If you've installed everything in a single root partition, all the files will be there. If your system is configured with separate /, /usr, /var, etc. partitions, you'll have to investigate further to determine where the files are actually installed; these tools to not address that question, although most application packages install most of their files in the /usr filesystem.) If the package_name is not installed on the system, rpmsize will report its size as zero.


To list the size of all packages on the system, sorted in descending order by size, run:

    	perl rpmhogs.pl

Output is written to standard output and may be redirected or piped to another program in the usual manner. Here are the first ten and last five lines from a run of this program on a system which had just been upgraded from Red Hat Linux 7.2 to 7.3:

  168,180,172   glibc-common-2.2.5-34 
  115,245,076   kernel-source-2.4.18-3 
  109,563,286   php-manual-4.1.2-7 
   65,477,844   xemacs-21.4.6-7 
   64,227,434   aspell-cs-0.2-3 
   61,776,259   rpmdb-redhat-7.3-0.20020419 
   59,441,866   jdk-1.3.1-fcs 
   49,206,083   tetex-doc-1.0.7-47 
   39,713,613   xemacs-el-21.4.6-7 
   37,629,394   kdebase-3.0.0-12 
        1,966   rootfiles-7.2-1 
          213   procps-X11-2.0.7-12 
           48   docbook-utils-pdf-0.6.9-25 
            0   basesystem-7.0-2 

3,711,330,234	Total

At a glance, it's obvious there are some promising targets for clean-up here. To obtain details for a package, use the command:

    	rpm --query --info package_name

Running this on, say, php-manual-4.1.2-7, we discover that the upgrade has installed almost 110 megabytes of documentation for the PHP Web scripting language, and this on a machine which isn't even a Web server! Then there's the 64 megabytes of spelling checker dictionaries for the Czech language in aspell-cs-0.2-3, and the list goes on and on. A couple of hours of cleanup focused exclusively on large and obviously unnecessary packages freed up more than 600 megabytes on my system partition, reducing its occupancy from 98% to 83%, and that's with leaving both the Gnome and KDE desktop environments installed simultaneously (this is a development machine, and I occasionally wish to test programs for compatibility with both of these desktop systems).

Details and Downloading

In the examples on this page, I've showed the programs being run by explicitly invoking Perl; each contains a #! /usr/bin/perl line at the top, so if Perl is installed at this location, you may run them directly as applications. Both programs are written in "Perl Classic", and should work with any version of Perl from 4.036 to the 5.6.1 release on which I developed them.

Obviously, you shouldn't go around deleting packages from your system unless you know what you're doing and fully comprehend the consequences of your actions. RPM's dependency management will help you to avoid common pitfalls, but before you delete anything related to the kernel or system, be absolutely sure you're not inviting disaster. As long as a package is an application included on your distribution CD-ROMs, you can always re-install it if you later discover you need it.

The programs are supplied as a Gzipped TAR archive:

rpmsize.tar.gz (1.6 Kb)

which simply extracts the two Perl programs to the current directory. You can modify the formatting of numbers, column sizes, and separators by changing declarations at the top of rpmhogs.pl; the comments explain the options available.

This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided "as is" without express or implied warranty.

Other System Administration Tools at Fourmilab

Fourmilab Home Page

by John Walker
30th May 2002