Imagine now that you have a bunch of different replications of the experiment, each with their own Z-score. Now the null hypothesis says that each of those Z-scores is an independent sample of a Z distribution -- each of them have the characteristics I mentioned above and the value of each one has no influence of the values of the others. How does one go about looking at what all of them together says about the null and alternative hypotheses?

The ideal answer takes into account that the different Z scores are based on experiments with a different number of trials. That means that an experiment with just a few trials, which rejects the null hypothesis counts for a lot more than a much larger experiment which rejects the null hypothesis by the same amount. The result is a weighted combination of the Z scores which can then be rescaled so that it to is a Z score.

A quick and dirty procedure -- which is valid but weak (it may fail to show something which the above procedure shows clearly) -- is to look at the sum of the Z-scores. If many of the separate Z-scores are high, then the sum will be even higher. We can reasonably ask what distribution we would expect from what the null-hypothesis says is the sum of a bunch of Z-scores. The answer is pretty simple. The sum of N Z-scores is a normal distribution with a standard deviation of sqrt(N). Therefore if we divide the sum by the sqrt(N), we will get another Z score. It looks like Dick made a simple computational error here. The "total Z score" on your page is the result of dividing the sum of the Z-scores by 5 (sqrt(25); Z=5.308) rather than by sqrt(26) (Z=5.205).

Imagine now, though, that you were running an experiment where sometimes there was a large positive deviation and sometimes a large negative one. If you were to look at the sum of the Z-scores in this case, the positive values and the negative values would tend to cancel each other out and you would come out with a rather small and unimpressive "total Z score", even though the large deviations should not be there according to the null hypothesis. We would then have a case with a small mean but a lot of variance (i.e., a lot more extreme variation around the mean than expected). One way to create a single number which might detect this is to sum, not the Z scores themselves, but their squares. Large negatives and large positives would both show up as positive additions so that a lot of variance would show up as an exceptionally large sum of squares. Of course, the sum would always be non-negative, so a positive deviation does not mean that something is there. We need to know the distribution of the sum of squares of a bunch of independently distributed z-scores.

The answer turns out to be easy. The sum of the squares of N independently distributed Z scores follows a distribution called the chi-square distribution. The chi-square distribution has a single parameter which is called "the number of degrees of freedom" and in this case, that parameter equals N. There is nothing profound about this, because the chi-square distribution with N degrees of freedom is *defined* to be whatever distribution results from the sum of the squares of N standard normal distributions (profundity occurs when the chi-square distribution shows up in other contexts, as it does). The important thing, then, is simply that the chi-square distribution is well characterized. You can find tables of it, for example, in the back of virtually any stat text.

The sum of the squares of the Z-scores in this chart is 111.29, which should be compared to a chi-square distribution with 26 (not 25, is this the same error once again? Did Dick miscount the number of Z-scores he was working with?) degrees of freedom. How extreme is this? The odds of this sum being this large is one chance in 630 thousand million (what us Yanks call 630 billion). This is clearly more extreme than the "mere" one chance in 18 million which the "total-Z" indicates.

I would say that he was not exaggerating when he said that this is pretty strong evidence that "something" (other than the null hypothesis) "is going on". Whether that "something" is something interesting (e.g., paranormal) depends on an analysis of the tightness of the experimental protocols.

Topher Cooper