We've just been discussing the two types of errors that can occur in a hypothesis test. As we've mentioned, the probability of committing the first type of error is equal to α, the level of significance. That is, this probability is actually fixed by the statistician running the test.
What about β, the probability of committing the second type of error? Well, this is more tricky. Not only do we not set β, we cannot even determine what it is. The reason for this is in the nature of the two errors and the conditional probability that they occur.
What happens when we commit a Type I error? Well, the null hypothesis is true (this is the condition of the conditional probability α) and we have then rejected it. The condition that the null hypothesis is true means that a specific value is assigned to the population parameter. This makes α a much more tangible probability to work with.
To take an example (and ignoring the more complex mathematical detail) suppose we are testing the null hypothesis that a population proportion π is equal to 0.5. We might design the test so that we will reject this null hypothesis if we collect a sample with a sample proportion less than 0.4 or greater than 0.6.
What is a Type I error here? A Type I error would occur if the null hypothesis is true (that is, π is actually 0.5) and we reject that null hypothesis (that is, we get a sample proportion less than 0.4 or greater than 0.6). It is important to note that the condition in this probability assigns a specific value, 0.5, to the population parameter π. The probability of a Type I error in this situation can be expressed as:
α = P(our sample proportion is less than 0.4 or greater than 0.6 | the population proportion is 0.5)
We define α!
Remember: in an actual test we define the probability of this error to be some specific value (like α = 0.05, for example) at the beginning of the test. Then, the boundary points (like 0.4 and 0.6) are calculated based on what this probability is. But the point is that this probability has a tangible relationship to the numbers in the test.
By using sampling distributions, this probability can be known! Knowing this probability is equivalent to being able to answer the question:
If a sampling distribution for the proportion has a mean of 0.5, how likely is it to collect a sample with a sample proportion less than 0.4 or more than 0.6?
To commit a Type II error, the null hypothesis must be false. As opposed to when the null hypothesis is true, this doesn't specify a particular value for the population parameter. We will explore the complexity of finding β, the probability of a Type II error, by looking at an example.
Testing the mean I.Q. - errors
Suppose a sociologist is testing whether the students from a particular state have a mean I.Q. that is higher than the national average of 100:
H0: μ = 100
HA: μ > 100
A Type I error can only occur if the null hypothesis is true, so can only happen if the true population mean is actually 100. If the population mean is 100, we can assert things about the sampling distribution of the mean. This indicates which observed sample means will be likely, which in turn indicates how likely we are to calculate a test statistic that will lead us to (incorrectly) reject the null hypothesis.
A Type II error can occur whenever the null hypothesis is not true. But all we can say in this situation is that the true population mean is not 100. How likely it is that we will fail to reject the null hypothesis will depend on what the true population mean actually is.
For example, let's suppose in this test the proposed value of 100 is way off the mark: suppose the true population mean is actually 150. Then in this case, we probably won't commit a Type II error. Why? Because the true population mean is so far from the proposed value, any sample mean we calculate is probably going to be very different from 100, and so we will (correctly) reject the null hypothesis.
But what if the true population mean were 101? Then we run a higher risk of committing a Type II error because 101 is quite close to 100. And because the true population mean (101) is quite close to the proposed value (100), it is likely that we will get a sample mean relatively close to 100 and therefore (incorrectly) fail to reject the null hypothesis.
In relation to Type II errors, the point is that, since we don't know the true value of the population parameter (it is the thing we are testing!), we can't specify how likely or unlikely it is that we will fail to reject the proposed value in the null hypothesis. However, we do have some influence over what the value is, even if we cannot exactly specify it.
For a fixed sample size, there is an inverse relationship between α and β. That is, the less likely it is that we commit a Type I error, the more likely it is that we commit a Type II error. And the less likely it is that we commit a Type II error, the more likely it is that we commit a Type I error.
In fact, this answers a question that you may have had by now: 'Why we don't choose a value of zero for α!?!' Given that we can choose to set α to be whatever we want, the temptation might exist to make α as close to 0 as possible. However, the smaller you make α, the larger you make β. In other words, the less likely you are to reject a true null hypothesis, the more likely you are to not reject a false one! So there is always a trade-off between the two error types, and your decision relating to choosing the level of significance will depend on which error is worse for your test.
The situation in the following example compares the relative dangers of each error type.
Battery lifetime
Earlier in this chapter, we saw a test for the average lifetime of a battery made by a manufacturer. The manufacturer has claimed that its batteries last an average of 60 hours, but a consumer watchdog is testing to see if the true average is lower:
H0: μ = 60
HA: μ < 60
Suppose the outcome of this test will determine whether or not the manufacturer will be criminally fined millions of dollars for false advertising.
What are the two errors that can occur in the test? A Type I error occurs if the true average is actually 60 hours but a sample suggests that it is lower than this. A Type II error occurs if the true average is below 60 but a sample fails to establish this fact.
Which type of error is worse? That probably depends on your perspective! One perspective is that a court would only want
to hear evidence against the manufacturer if it is relatively 'certain'. That is, the court regards a Type I error as more serious than a
Type II error. So they may choose a small level of significance to
minimi
Why is there an inverse relationship between α and β? The reason should be relatively intuitive. We've said on a few occasions that α is a measure of how 'strong' the statistician wants the evidence to be before they agree to reject the null hypothesis. The smaller α is, the stronger the evidence will need to be. This could be looked at as a 'reluctance' to reject the null hypothesis. However, a consequence of doing this (that is, having a small α and requiring strong evidence) is that the null hypothesis is less likely to be rejected - even if it is false! That is, a consequence of having a smaller α is to have a larger β.
We can be more technical about this. The level of significance α determines your critical values, which then determine your region of rejection. Put simply, a low level of significance will lead to critical values that are further and further away from 0. As a result, a low level of significance leads to a smaller region of rejection (and lower chance of committing a Type I error). But the smaller the region of rejection, the lower your chance is of rejecting the null hypothesis - even if it isn't true. That is, you will be less likely to observe a test statistic in the region of rejection (regardless of what the actual population parameter is) if the region of rejection is smaller.
The diagram to the right recreates the I.Q. example introduced earlier, where the proposed value for the population mean in the null hypothesis was 100. The diagram shows a situation where the true population mean is 110. That is, the null hypothesis is not true. The diagram shows the critical values for a variety of levels of significance. Notice that as the level of significance shrinks, so does the region of rejection. Therefore, the chance of failing to reject the null hypothesis increases. In other words, lowering the chance of a Type I error increases the chance of a Type II error.