In Chapter 7 on statistical estimation we emphasi
Another thing that we emphasi
As with estimation, hypothesis tests always involve some uncertainty. When we draw a conclusion from a hypothesis test, we can't be certain that we are correct in doing so. And as with estimation, this doesn't mean we have made a mistake! Such errors of uncertainty are inherent to testing.
Unfair coin?
Suppose someone gives you a coin and tells you it is fair. You flip the coin 1,000 times and it comes up heads 800 times. You conclude that the person who gave you the coin was wrong: the coin is not fair. Mathematical details aside, this scenario is essentially a hypothesis test.
But you could, of course, be wrong in your conclusion. It is possible that the person was correct, that the coin was fair, and that such an extreme sample as 800 heads (and 200 tails) came from this population. It is extremely unlikely, but possible. But if this is the case, there is nothing you can do about it. The methodology of hypothesis testing would lead you to reject the hypothesis that the coin is fair.
The above example can be put into the language of hypothesis testing like this: You rejected the null hypothesis when it was true and shouldn't have been rejected.
You may recall that the probability of such an event was defined earlier in this chapter to be the level of significance, α.
For most of this chapter, we have thought of the level of significance as being that number that you choose, at the beginning of the test, that determines your critical value and region of rejection. And this is, operationally, what α does. But it is defined to be the probability of committing a particular type of error: the error of rejecting the null hypothesis when it is true.
In fact, because α is the probability of rejecting the null hypothesis when it is true, it is actually a conditional probability. Expressed as such, we can say:
α = P(H0 is rejected | H0 is true)
It may seem like α is two entirely different things: it is the probability of a particular error occurring but it is also a number that determines key values in our hypothesis test. Well, it is both of these things but they aren't all that different.
As an example, suppose as a statistician you choose a level of significance α = 0.05 for a two-sided test for a population proportion:
H0: π = 0.3
HA: π ≠ 0.3
This level of significance leads you to the critical values -1.96 and 1.96 and the region of rejection is the set of values outside these critical values. Notice that, by design, 5% of the values lie in the region of rejection and 95% lie between the two critical values.
Now, under the assumption that the null hypothesis is true, the test statistic from your sample will follow the standard normal distribution. So if the null hypothesis is true, then 95% of the time you'll get a test statistic between the two critical values (and therefore not reject the null hypothesis) and 5% of the time you'll get a test statistic in the region of rejection (and therefore reject the null hypothesis).
That is, there is a 5% chance that you will reject the null hypothesis in this test if it is true. This is because the level of significance you chose at the beginning was α = 0.05! This is the 'connection' between the two notions of α.
Instead of thinking about 'percentage chances', you may like to look at it like this: suppose you happen to know that the null hypothesis is true in the above test. That is, suppose you happen to know that π is equal to 0.3. Now suppose 100 statisticians (who aren't lucky enough to know what you know!) decide to test this null hypothesis with a level of significance of α = 0.05. They each separately go out and collect their own sample and calculate their own test statistic. On average, 95 of these statisticians would not reject the null hypothesis. The other 5 would (incorrectly) reject it.
This situation is similar to the 100 statisticians we were discussing earlier, who were constructing 100 confidence intervals. And, just as we pointed out then, the 5 statisticians who incorrectly reject the true null hypothesis haven't made a 'mistake'. They were just 'unlucky'. Hypothesis testing is a part of statistical inference, and statistical inference is always uncertain. And, because it is uncertain, there is always the chance that an error will be made.