The World Wide Web Consortium (W3C) Markup Validation Service is an essential resource for Web developers who wish to create standards-compliant documents. This freely-available service checks HTML and XHTML documents for compliance with a variety of versions of the relevant standards and reports errors in a form which identifies any errors in the markup. The validator can check documents specified by a Web URL, files uploaded from the user's computer, or text pasted directly into a text box on the validator request page.

These options suffice when you're developing a new page, but if you're generating a sizable collection of documents automatically (for example, with a content management system for a Web log), or you have a complicated existing Web tree you wish to check for standards compliance, submitting each document individually for validation can become tedious. BulkValidator is a Perl program which automates the process of validating multiple documents. It submits either all of the HTML/XHTML documents in a directory or all documents in that directory and its subdirectories to the W3C validator and reports the results. For any documents which failed validation, the error reports are saved in a “discrepancies” directory whence they can be subsequently scrutinised.

Downloading and Installation

Included in the archive are the Perl program BulkValidator.pl and the manual page for the program extracted from the documentation embedded within it, as well as this document. You can use these files in the directory in which you extracted them or install them in your system's library directories to make them available to all users. You may wish to rename the Perl program as BulkValidator so it can be run as a regular command line program; if you do so, make sure the location of Perl in the first line of the program corresponds to where Perl is installed on your system.

This program requires the Perl modules Data::Dumper, Pod::Usage, LWP, and URI::Escape. If your Perl installation lacks one or more of these modules, you will have to install them (either system-wide or for your own user account) before you can use BulkValidator. In addition, validation of files in subdirectories requires the Unix find command. While most systems which support Perl provide this command, if it is not present (for example, on a minimalist Cygwin configuration), you will have to install it if you wish to use this feature.

Manual Page

`BulkValidator`

SYNOPSIS

BulkValidator [--copyright] [--density num] [--discrepancy dir] [--firstfiles num] [--help] [--man] [--pause num] [--rpause factor] [--shuffle] [--skipfile num] [--tree] [--validator url] [--verbose] [--version] [directory]

DESCRIPTION

BulkValidator submits all of the HTML/XHTML files either in a specified directory (the current directory is assumed if none is given) or in that directory and any subdirectories to the W3C HTML validator and reports the results. The validation reports for any files which failed validation are saved for review.

OPTIONS

All options may be abbreviated to their shortest unambiguous prefix.

--copyright: Display copyright information.
--density num: A randomly chosen subset of num percent of the files will be validated. If you have a large collection of mostly similar files and do not want to spend the time or burden the validator with processing them all, specify a modest percentage of the files to test a statistical sample of them. Use the --firstfiles option if you wish to unconditionally validate some number of the first files in the list. If no --density is specified, all files will be validated (equivalent to a num specification of 100).
--discrepancy dir: The validation reports for any files which failed validation will be stored in the directory dir, which will be created if it does not already exist. If no --discrepancy directory is specified, reports will be stored in a ValidationDiscrepancies directory created within the current directory.
--firstfiles num: The first num files will always be validated regardless of the --density specification. The default is 0, which causes no files to be unconditionally validated.
--help: Display how to call information.
--man: Display this complete manual page.
--pause num: After each file is validated, BulkValidator will pause for num seconds (plus an additional delay governed by --rpause, see below). The default is 15 seconds. A modest delay after each request avoids unduly burdening the W3C Validator.
--rpause factor: If --pause is nonzero, a random increment between zero and the --rpause factor multiplied by the --pause num will be added to the delay after each request. The factor is a floating point number; the default is 1, which results in a delay between the --pause specification and twice that value.
--shuffle: If specified, files will be validated in random order. Otherwise, files are validated in alphabetical order.
--skipfile file: The specified file is the output from one or more previous runs of BulkValidator (which you can capture by redirecting standard output to a file or piping it to tee). All files which passed validation in previous runs will be skipped on this run. Use this option when you're chasing down validation errors in a collection of files; only the files which failed before will be re-examined in this run.
--tree: All .html and .htm files in subdirectories recursively traversed starting at the directory specified on the command line will be validated.
--validator url: The specified url is used to request validation instead of the default http://validator.w3.org/check. The validator must accept file uploads with the same form fields as the W3C HTML validator and return pass/fail results in the same syntax.
--verbose: Generate verbose output to indicate what's going on.
--version: Display version number.

EXAMPLES

Validate all HTML files in the current directory, placing discrepancy reports in a ValidationDiscrepancies subdirectory of the current directory.

    perl BulkValidator.pl

Validate the first 10 files in alphabetical order, then 15% of the remaining files chosen at random from the directory /var/www/html/recipes/ratburger and subdirectories, placing discrepancy reports for any files which fail validation in /home/chef/goofs.

    perl BulkValidator.pl --tree --firstfiles 10 --density 15 \
                          --discrepancy /home/chef/goofs \
                          /var/www/html/recipes/ratburger

Validate files in /var/www/html/recipes/ratburger, saving the pass/fail results in /home/chef/goofs/val.log. Then, after editing, revalidate all the files which failed to validate the first time.

    perl BulkValidator.pl /var/www/html/recipes/ratburger \
            | tee /home/chef/goofs/val.log
       . . . Edit, edit, edit . . .
    perl BulkValidator.pl --skipfile /home/chef/goofs/val.log
            /var/www/html/recipes/ratburger

FILES

If no directory is specified on the command line, the current directory is validated.

The validation summary is written to standard output. You can redirect this to a file or make a copy with tee if you wish to use it in subsequent runs to exclude already-validated files with the --skipfile option.

The validator reports for any files which failed validation are stored in the --discrepancy directory, which defaults to ValidationDiscrepancies in the current directory. Files in this directory are named with the path name of the validated file, with all slashes replaced by underscores. Validation reports for files which previously failed validation but passed this time will be automatically deleted, and the --discrepancy directory will be removed if, at the end of the run, no files remain within it.

BUGS

Please report bugs to bugs@fourmilab.ch, indicating the version numbers of BulkValidator, Perl, and the Perl LWP module installed on your system.

Bulk Validator