Think of this as a mathematical intuition puzzle. You could
work it out analytically, or you might write a computer
program and beat it to death numerically (don't bother, since such a
program written in JavaScript is embedded in this page), but
the point is to first see if you can intuit the correct answer from
the mathematical properties of the components of the problem.
It's well worth a little head scratching, as the result is quite
beautiful and shows how seemingly unrelated things in mathematics
show a deeper unity when you look more closely at them.
Random Number Blackjack
The problem involves a game something like blackjack, but
played with random numbers between zero and one instead
of a deck of cards. To be precise, the random numbers are
uniformly distributed, which means that all numbers
in the range are equally probable, and greater than or equal
to zero and less than one. It doesn't matter whether the numbers
are true hardware-generated
random values or a pseudorandom sequence generated by a computer.
To play the game, start with a sum of zero. “Draw” a random number
from the generator, and add it to the sum. Since the generator always returns
numbers less than one (although they can be arbitrarily close to one), the sum
after one draw will always be less than one. Tally one number drawn. Now draw
another random number from the generator, add one to the tally of numbers drawn,
and add it to the sum. If the sum is now greater than one, you've
“gone bust”—record the tally of numbers drawn, reset the
sum to zero, and continue with another “hand”. While the sum
remains less than one, continue to draw numbers, increment the tally, and
add them to the sum until it exceeds one, then record the tally at
which you went bust.
Here's an example of several “hands” of the game.
Hand 1
Tally
Draw
Sum
0
0.0000
1
0.4770
0.4770
2
0.0516
0.5286
3
0.1793
0.7079
4
0.2250
0.9329
5
0.0438
0.9767
6
0.3006
1.2774
Bust!
In this first hand, it took six “draws” from the random
number generator before the sum exceeded one. We record the final
tally of 6 as the result of the hand and try another.
Hand 2
Tally
Draw
Sum
0
0.0000
1
0.8474
0.8474
2
0.3005
1.1479
Bust!
This time we went bust on only the second draw. Again, recording the tally
of 2, we try again.
Hand 3
Tally
Draw
Sum
0
0.0000
1
0.5535
0.5535
2
0.2721
0.8257
3
0.5390
1.3646
Bust!
This time we bust out on the third draw, so we note the final tally of 3.
At this point the list of tallies from the three hands so far reads:
Hand
Tally
1
6
2
2
3
3
Now let us compute the average (to be precise, the arithmetic mean) of the
number of draws it took to go bust. This is just the sum of the tallies for all
the hands played so far divided by the number of hands. After three hands, we
compute this average (rounded to four decimal places) as:
(6 + 2 + 3) / 3 = 3.6667
Now we can pose the puzzle question:
As the number of hands increases, does the average number of draws
per hand converge to a specific value, and if so, what?
Hint 2. Play several hands of the game by pressing the “Play” button
below and see if you get a sense for how the average behaves. No fair tabulating
a large number of results and computing an average, however!
Hint 3. The mean value of a uniformly distributed random
variable between zero and one is obviously 1/2. Why doesn't the mean number
of draws then trivially converge to 2?
The mean value of the number of draws per hand converges to the mathematical
constant e, base of the natural logarithms, approximately
2.71828….
You can observe the convergence of the mean value toward e over a
large number of hands by entering the number of hands to be played in
the box and pressing “Run”. The iteration count,
mean value, error (absolute value of the difference between the mean
value and e), and the error as a percentage of the value
of e are reported for each hand up to 10, every 10 from
10 to 100, every 100 for 100 through 1000, and so on. If you
request a large number of hands and/or your computer is slow, you may
have to click a box warning you of an “unresponsive script”
one or more times to permit the computation to run to completion.
The example uses the pseudorandom number generator built into your browser's
JavaScript interpreter, and will produce slightly different results every
time you run it, yet the mean will invariably converge toward e.
Intuitive Exploration
Let's begin by answering the questions posed by Hints 1 and 3.
Hint 1. What is the probability distribution of the sum of a
series of uniformly distributed random variables?
The probability distribution of the sum of a sequence of uniformly
distributed random values rapidly approaches that of the normal distribution
as the number of values summed increases. The following are histograms
of 250,000 sums of n random values for n from 1 through 4.
The sums have been normalised to the mean value is 0.5 in all the charts.
n = 1
n = 2
n = 3
n = 4
The probability density function for a variate X in a
normal distribution with mean value μ and standard deviation
σ is:
This nature of this function becomes even more clear if we assume a
standard normal distribution with mean value μ=0
and standard deviation σ=1:
This is clearly a function in e (π appears as well, but simply as scale
factor constant). Since it's unlikely in the extreme such a puzzle would
have been posed had the answer not been a well-known mathematical
constant, this is a compelling reason to suspect it might be e, which
is further reinforced by Hint 3 below.
Hint 3. The mean value of a uniformly distributed random
variable between zero and one is obviously 1/2. Why doesn't the mean number
of draws then trivially converge to 2?
A naïve guess might be that the mean number of draws converges to
two because the mean value of each value drawn is 1/2, and two
times this value is 1. But, on further reflection, that can't
possibly be right. Why? Because since the random values are
always less than 1, you can never go bust on the first draw. Hence
the minimum number of draws to go bust is 2. Further, it's
clear that in some hands you'll have to make more than two draws
before going bust, and hence the mean will have to be greater than
2. However, three times the mean value of 1/2 is 1.5, so it doesn't seem
likely that the mean will be greater than three. Hence we have reason
to suspect the mean is somewhere between 2 and 3. Now, what well-known
constant falls between those two numbers?
Rigorous Analysis
The following discussion is based upon the analysis in
Eric W. Weisstein's
MathWorld page,
“Uniform
Sum Distribution”, expanded to show intermediate
algebraic steps and explain some of the identities used.
To begin, we observe that the probability that a sum of n
uniformly distributed random numbers Xn will exceed 1
while the sum of n−1 numbers will be less than
or equal to 1 is:
which is simply the difference in the integrals of the probability
distribution functions as plotted above for sums of n and
n−1 numbers. For uniformly distributed random variates,
there is a closed form solution for this integral in terms of the
factorial function, so the expression may be rewritten and then
algebraically simplified as follows.
The mean number of random number draws to yield a sum which exceeds 1
may then be written as a sum of the number of draws multiplied by the
probability, as determined above, of that number of draws being the one
which sums to more than 1. Substituting in the expression for this
probability and simplifying, we find:
Doesn't that final series look familiar? It ought to—it's
the well-known series expansion for the exponential function with
an argument of 1! (If you followed the above simplification
carefully, you might have noticed some fancy footwork in the last
step, where a sum from 1 to infinity of 1/(n−2)! was
replaced by a sum from 0 to infinity of 1/n!. You're
perfectly entitled to wonder what became of the first term in this
summation: 1/(−1)!. The factorial function is defined only
for integers greater than or equal to zero, so in one sense this
term doesn't mean anything, but a closer look provides insight
into what's really going on. The Gamma function, which is the
generalisation of the factorial for real and complex arguments,
has a value of infinity for an argument equivalent to the
factorial of −1, so the first term in the summation can be
taken as 1/∞, or zero. This, in turn, simply means that
since we're summing random numbers less than one, the probability
of obtaining a sum greater than one by summing just one of these
numbers is exactly zero. Hence the first term in the summation is
zero and may be discarded by setting the start of the summation to
zero.)
Hence, the mean number of random number draws to exceed 1 converges
to e. If that isn't beautiful, I don't know what is.
References
Derbyshire, John. Prime Obsession.
New York: Plume, [2003] 2004.
ISBN 0-452-28525-9. Pages 366–367.