« Reading List: Tubes |
Main
| Reading List: Turing & Burroughs »
Saturday, September 22, 2012
Floating Point Benchmark: Haskell Language Added
I have posted an update to my trigonometry-intense
floating point benchmark which adds
Haskell to the list of languages in which the benchmark is implemented. A new release of the
benchmark collection including Haskell is now available for downloading.
The Haskell benchmark was developed and tested on
Glasgow Haskell Compiler version 7.4.1 on Ubuntu Linux 11.04; the relative performance of the various language implementations (with C taken as 1) is as follows. All implementations of the benchmark listed below produced identical results to the last (11th) decimal place.
Language |
Relative Time |
Details |
C |
1 |
GCC 3.2.3 -O3, Linux |
Visual Basic .NET |
0.866 |
All optimisations, Windows XP |
FORTRAN |
1.008 |
GNU Fortran (g77) 3.2.3 -O3, Linux |
Pascal |
1.027
1.077 |
Free Pascal 2.2.0 -O3, Linux
GNU Pascal 2.1 (GCC 2.95.2) -O3, Linux |
Java |
1.121 |
Sun JDK 1.5.0_04-b05, Linux |
Visual Basic 6 |
1.132 |
All optimisations, Windows XP |
Haskell |
1.223 |
GHC 7.4.1-O2 -funbox-strict-fields, Linux |
Ada |
1.401 |
GNAT/GCC 3.4.4 -O3, Linux |
Lisp |
7.41
19.8 |
GNU Common Lisp 2.6.7, Compiled, Linux
GNU Common Lisp 2.6.7, Interpreted |
Smalltalk |
7.59 |
GNU Smalltalk 2.3.5, Linux |
Python |
17.6 |
Python 2.3.3 -OO, Linux |
Perl |
23.6 |
Perl v5.8.0, Linux |
Ruby |
26.1 |
Ruby 1.8.3, Linux |
JavaScript |
27.6
39.1
46.9 |
Opera 8.0, Linux
Internet Explorer 6.0.2900, Windows XP
Mozilla Firefox 1.0.6, Linux |
QBasic |
148.3 |
MS-DOS QBasic 1.1, Windows XP Console |
Benchmarking in a purely functional language with lazy evaluation
like Haskell presents formidable challenges and requires some
judgement calls which do not occur in benchmarks of procedural
languages (although similar situations may arise in procedural
languages with strong optimisation of invariant results).
Due to Haskell's lazy evaluation, a straightforward port of the
floating point benchmark code from a procedural language (I started
with the Ada implementation for this project, since I considered it
the cleanest and best documented, using data types much in the
spirit of Haskell), will result in a benchmark program which runs in
essentially constant time. Since only the last result of the many
iterations through the benchmark computation actually contributes to
output visible to the outside world, all of the earlier computations
go ker-thunk into the memory hole and are never actually performed.
“If they can't see it, why do it?” is the Haskell motto. I've had
employees who work that way.
A wise manager once told me, “People don't do what you expect, but
what you inspect.” So it is with Haskell. In order to get a fair
benchmark against a procedural language, we must arrange that each
of the requested iterations in the benchmark run actually be fully
calculated. In the main
fbench.hs, we accomplish this by two
ruses. First of all, for all but the final ray trace we vary the
clear aperture of the design by adding a pseudorandom value to it.
(Since the ray tracing algorithm runs in constant time regardless of
this parameter [or the rest of the design, for that matter], this
does not affect the timing result.) Each ray trace is then forced
to be evaluated by applying the “
deepseq” function to its ultimate
result. (You may have to install the
deepseq package into your
Haskell environment—it is not, for example, installed by default on
Ubuntu Linux when you install the compiler,
ghc.) Since all we care
about is ensuring the clear aperture is different on each successive
ray trace, we use a pseudorandom generator with a constant seed.
Reference results are produced by compiling
fbench.hs into a binary
executable (see the
Makefile for details) and then running it with
an iteration count (specified by the first argument on the command
line) which results in a run time of around five minutes. If a
positive iteration count is specified, a classic
fbench run which
expects manual timing will be done. If the iteration count is
negative, the run will be in batch mode, intended to be timed by
running under the Unix “time” utility or equivalent.
But note that we're doing additional work by generating the
pseudorandom numbers and the clanking recursion machinery and
consequent garbage collections. Haskell's default pseudorandom
generator is famously slow, so we must be careful that we're
measuring Haskell's performance on our own code, rather than the
wrapper which invokes the generator. This is accomplished by the
fbench_baseline.hs program. This uses precisely the same logic to
run the benchmark, but performs a null computation in place of the
ray trace. (I've left all the ray trace code in this program, in
case you want to experiment with switching all or part of it back
on.) To calculate the “true” run time of the benchmark on the ray
tracing code, we take the run time of
fbench.hs and subtract that of
fbench_baseline.hs for the same number of iterations. In comparing
this with the run time of procedural languages, we assume that the
loop overhead of those languages wrapping iterations of the
benchmark is negligible; this is almost always the case, but it is
not so in Haskell, hence the need for this correction.
Haskell purists may object to this whole exercise and contend that
the fairest benchmark is a straight port of the C, FORTRAN, etc.
code, measuring it against those languages. I am sympathetic to
this argument—after all, if one of the main design goals of the
language is to allow factoring out redundant computations, isn't
that worthy of being measured in a benchmark? On the other hand,
the mission of
fbench, since its origin in the 1980s, has been to
measure the performance of trigonometric function intense code, and
one hardly accomplishes that by running code where all but the last
iteration of the benchmark is optimised out by the compiler.
Understand—I am extremely impressed, if not in awe, of Haskell's
optimisation, but I also want to get a result which people can
compare against realistic code, not a contrived benchmark like this
one which (in procedural language implementations) simply does the
same computation over and over. If you want to experience the magic
of lazy evaluation, build the
fbench_pure.hs program. It eschews
randomisation of the design being ray traced and forcing evaluation
of the results, and consequently runs in near-constant time
regardless of the iteration count requested. If you wish to explore
other strategies for fairly benchmarking Haskell, it's probably best
to start with
fbench_pure.hs, to which you can add your own code to
force evaluation as appropriate.
Finally, let me note that this is the first Haskell program I have
ever written (I developed a deep appreciation for the language in
the process, and I trust it shall not be my last). I have tried to
do things “the Haskell way”, but I may have committed any number of
beginner blunders which elicit gnashville sounds from the teeth of
those fluent in the language. If so, please chastise me and offer
suggestions as to how I might remedy the shortcomings you perceive
in this code.
In this release, the C benchmarks, which were previously at the top level of the directory tree in the archive, have been moved into their own
c subdirectory with its own
Makefile. The C benchmark programs are unchanged.
Update: Benchmark results for Haskell revised to use timings with GHC 7.4.1 and Haskell platform 2012.2.0.0. Its results are substantially better than those of the 6.12.3 compiler I originally used. (2012-09-23 12:23 UTC)
Update: Benchmark results for Haskell revised to use timings with strict fields for primitive types and the
-funbox-strict-fields compiler option, which almost doubled execution speed. Special thanks to
Don Stewart for analysing the benchmark and recommending this. (2012-09-24 15:40 UTC)
Posted at September 22, 2012 15:47