Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"I can't believe there isn't a well-defined analytical expression for that..."

Well, yes, there is some solid mathematics behind the pot stirring!

Writing out some appropriate mathematics, will likely want all 33 billion combinations of 38 things taken 18 at a time.

With this math, there is no use of 'prior probabilities', and the Monte Carlo is just a fast way to replace finding all 33 billion combinations.

For more, can see (with TeX markup):

E.\ L.\ Lehmann, {\it Nonparametrics: Statistical Methods Based on Ranks,\/} ISBN 0-8162-4994-6, Holden-Day, San Francisco, 1975.\ \

Jaroslav H\'ajek and Zbyn\v ek \v Sid\'ak, {\it Theory of Rank Tests,\/} Academia, Prague, 1967.\ \

Sidney Siegel, {\it Nonparametric Statistics for the Behavioral Sciences,\/} McGraw-Hill, New York, 1956.\ \

So, it's old material. There are many such hypothesis tests.

But the old material essentially always has, for the student case, only one number on each student. The part about what to do when each student has 4 scores can take us into the journals and maybe start some more research. Similarly for the 12 numbers from the 'system' to be monitored.



With this math, there is no use of 'prior probabilities'

But the full hypothesis is "Given the data, are girls better than boys at this exam?" and clearly, the prior probability is relevant. Maybe in this case one might want to use a 50-50% prior, but in general, if the hypothesis was instead "Given the [same] data, can we conclude that this paranormal event really happened?" then a healthy skeptical prior would be in order.

Anyway, regardless of the "prior" issue, I've thought some more about your original problem, and I'm not so sure about your methodology. From my perspective, if you want to reach a "girls better than boys on this test in this class - true or false" conclusion, then individual variance is a crucial issue. Assuming that all girls and boys would always get the exact same result were they to take the same test over and over again, then you have a variance of 0. And thus, one could simply check if (average of boys) < (average of girls) and conclude accordingly. At the other extreme, if students show huge individual variance (eg.: their score depends on whether they had breakfast that morning), then the test results are almost meaningless... So the outcome is crucially dependent on this variance, which your problem description makes no mention of, and which Monte Carlo methods really do nothing to recover. One would have to make assumptions about it.

Maybe a better example (closer to what your other "system health" problem) would be: a boy takes an exam 20 times, a girl takes the exam 20 times, they get such and such results (assuming they don't improve inbetween). Is the girl better than the boy? Then one could assume a Gaussian distribution of test results for both, estimate their average and variance, then check for overlap between the 2 gaussians and conclude accordingly.

Maybe your MC really does boil down to something similar, but I don't see it. And sadly, I can't quite construct an argument about what I think is problematic with it. It just doesn't feel right.

Thanks for the references, I might check them eventually, though they seem too specialized for my needs. Someone recently posted this book on HN, and people had good things to say about it:

The Elements of Statistical Learning - Data Mining, Inference, and Prediction

http://www-stat.stanford.edu/~tibs/ElemStatLearn//

It's next on my reading list, but was absent from yours. It's available as a PDF --- the determining factor for someone with no library access.


"But the full hypothesis is 'Given the data, are girls better than boys at this exam?' and clearly, the prior probability is relevant."

No, prior probabilities have nothing to do with it.

We state our 'null' hypothesis that the boys and girls do equally well. This hypothesis has nothing to do with a belief of prior probabilities or belief of any probabilities at all. Instead, we state this hypothesis as something that will give us some mathematical assumptions to do some calculations to reject it and, then, conclude that it was false.

Generally in hypothesis testing we don't believe the null hypothesis as prior probabilities; indeed, likely we don't believe it at all and are stating it to reject it and conclude it is false.

In more detail, we assume that 20 boys and 18 girls are 38 independent samples from some one distribution. It turns out, we don't need to say anything about that distribution because we are being 'distribution-free'. In particular, we get to ignore the Gaussian distribution. GOOD.

Independent? Okay: Suppose we DO give you the true distribution of the data and the first 37 scores. Now you get to guess score 38. Do the 37 scores help you beyond just the distribution? No. Same for any subset of the scores. Then, we have independence.

With this null hypothesis, the average of the scores of the 20 boys and the average of the scores of the 18 girls should be 'close'. How close? Well, under the null hypothesis and with the values we observed, we have a way to proceed: The distribution of the difference in the scores, with everything we do know given and fixed, we can find. For this distribution, basically we look at all the 33 billion or so differences obtained by taking all combinations of 38 things taken 18 at a time. Justification? If work at it mathematically, then under the null hypothesis can show that each of those 33 billion cases was equally probable.

Then we pick a small number, say, 1% for the size of our Type I error, that is, the probability of rejecting the null hypothesis when it is true.

Then we find the differences in the 1% tail of the 33 billion differences.

Then we look at the difference from our actual data. That difference will be one of the 33 billion. We see if that difference is in the 1% tail.

If the difference is in the 1% tail, then one of two things is true:

(A) The null hypothesis is true, the boys and girls are the same, that is, independent samples from the same distribution, and with our actual data the difference is relatively large, out in a tail, and we have observed something that happens only 1% of the time.

(B) The null hypothesis is false, that is, in some way the boys and girls are different. That is, we still believe the independence assumption, so what is false is just that the mean for the boys is different from the mean for the girls.

If the 1% is so small we don't believe (A), then we conclude (B).

Variance has nothing to do with it.

Welcome to distribution-free 'two sample' hypothesis testing 101.


I've been reading Jaynes again this week, and he's just very, very convincing. And so I'm trying to read everything you wrote through these Bayesian glasses, but sadly, I'm not successful. Jaynes is rather critical of Fisher's hypothesis testing, on the ground that you can't accept or reject an hypothesis on its own; you need an alternative to compare it to, and that alternative needs to make definite predictions. I don't see what the alternative to your null hypothesis is (the negation of the null hypothesis does not make definite predictions)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: