Imagine that you conduct an experiment which results in a "Z-score", i.e., in a
value which, if the null hypothesis is true, should vary, from replication to
replication, around a mean value of 0 with a standard deviation (amount of
"spread") of 1 and should follow a Gausian "bell-shaped curve" approximately.
Values very far from the mean will occur very rarely according to the null
hypothesis, so a Z-score which is far from 0 can be taken as evidence that the
null hypothesis is not what is going on.
Imagine now that you have a bunch of different replications of the experiment,
each with their own Z-score. Now the null hypothesis says that each of those
Z-scores is an independent sample of a Z distribution -- each of them have the
characteristics I mentioned above and the value of each one has no influence of
the values of the others. How does one go about looking at what all of them
together says about the null and alternative hypotheses?
The ideal answer takes into account that the different Z scores are based on
experiments with a different number of trials. That means that an experiment
with just a few trials, which rejects the null hypothesis counts for a lot more
than a much larger experiment which rejects the null hypothesis by the same
amount. The result is a weighted combination of the Z scores which can then be
rescaled so that it to is a Z score.
A quick and dirty procedure -- which is valid but weak (it may fail to show
something which the above procedure shows clearly) -- is to look at the sum of
the Z-scores. If many of the separate Z-scores are high, then the sum will be
even higher. We can reasonably ask what distribution we would expect from what
the null-hypothesis says is the sum of a bunch of Z-scores. The answer is
pretty simple. The sum of N Z-scores is a normal distribution with a standard
deviation of sqrt(N). Therefore if we divide the sum by the sqrt(N), we will
get another Z score. It looks like Dick made a simple computational error
here. The "total Z score" on your page is the result of dividing the sum of
the Z-scores by 5 (sqrt(25); Z=5.308) rather than by sqrt(26) (Z=5.205).
Imagine now, though, that you were running an experiment where sometimes there
was a large positive deviation and sometimes a large negative one. If you were
to look at the sum of the Z-scores in this case, the positive values and the
negative values would tend to cancel each other out and you would come out with
a rather small and unimpressive "total Z score", even though the large
deviations should not be there according to the null hypothesis. We would then
have a case with a small mean but a lot of variance (i.e., a lot more extreme
variation around the mean than expected). One way to create a single number
which might detect this is to sum, not the Z scores themselves, but their
squares. Large negatives and large positives would both show up as positive
additions so that a lot of variance would show up as an exceptionally large sum
of squares. Of course, the sum would always be non-negative, so a positive
deviation does not mean that something is there. We need to know the
distribution of the sum of squares of a bunch of independently distributed
z-scores.
The answer turns out to be easy. The sum of the squares of N independently
distributed Z scores follows a distribution called the chi-square
distribution. The chi-square distribution has a single parameter which is
called "the number of degrees of freedom" and in this case, that parameter
equals N. There is nothing profound about this, because the chi-square
distribution with N degrees of freedom is *defined* to be whatever distribution
results from the sum of the squares of N standard normal distributions
(profundity occurs when the chi-square distribution shows up in other contexts,
as it does). The important thing, then, is simply that the chi-square
distribution is well characterized. You can find tables of it, for example, in
the back of virtually any stat text.
The sum of the squares of the Z-scores in this chart is 111.29, which should
be compared to a chi-square distribution with 26
(not 25, is this the same error once again? Did Dick miscount the number of
Z-scores he was working with?) degrees of
freedom. How extreme is this?
The odds of this sum being this large is one chance in 630 thousand million
(what us Yanks call 630 billion). This is clearly more extreme than the
"mere" one chance in 18 million which the "total-Z" indicates.
I would say that he was not exaggerating when he said that this is pretty
strong evidence that "something" (other than the null hypothesis) "is going
on". Whether that "something" is something interesting (e.g., paranormal)
depends on an analysis of the tightness of the experimental protocols.
Topher Cooper