Throughout the previous three sections of this chapter, we developed a concept and methodology for hypothesis tests. You can use the step-by-step guide from the previous section to conduct a hypothesis test in many practical situations.
What we will do in this section is have a look at some of the finer details of hypothesis testing. This will help make the different components of the test clearer, but it will also provide some context for why statisticians make some of the decisions that they do.
To begin with, let's think about the level of significance, α. We made two things clear in the previous section:
Given the importance of α, how does a statistician decide what level to set it at? Well, as also discussed in the previous section, α is a measure of how much evidence the statistician will require before they agree to reject the null hypothesis. Remember: the sample in a hypothesis test can be used as evidence against the null hypothesis. The statistician can decide how 'strong' this evidence needs to be before they will indeed reject the null hypothesis.
For example, suppose a bank has put into place a new customer service system. The bank hopes that the average length in time taken to resolve a customer complaint call has dropped from the former average of 300 seconds. So they collect a sample to test the null hypothesis that the average call time is still 300 seconds. In this sample, the sample mean time is 295 seconds. Has the average time improved? Would you reject the null hypothesis? Of course, the answer to this will depend on things like how big the sample is and so on. But conceptually, some statisticians would consider a sample mean of 295 to be enough evidence that the null hypothesis is wrong. Others wouldn't.
And it isn't simply a matter of different statisticians requiring different levels of evidence. Different situations require different levels of proof.
Suppose a pharmaceutical company releases a new tablet that contains some amount of Drug X. It is known that 300mg of Drug X is toxic and it is illegal to prescribe medication with a 300mg dose of Drug X. So government tests are conducted to establish that the new tablet contains less than 300mg of Drug X. A sample is collected to test the null hypothesis that the tablets do contain an average of 300mg of Drug X. In this sample, the sample mean amount of Drug X is 295mg. Do you release the tablet into the public? Would you reject the null hypothesis?
Hopefully, you've noticed that the numbers in the two examples above (the bank's customer service system and Drug X) are the same. But the stakes are different in the two situations. Presumably, we would want a lot of evidence before we can confidently claim that the pharmaceutical company isn't poisoning the public. So, for this test we would set a low level of significance.
Put simply: the lower the level of significance, the more the sample will have to differ to the null hypothesis before we reject that null hypothesis.
There are no magic answers to what you 'should' set the level of significance at. It is common to conduct a 95% hypothesis test, meaning that the level of significance is α = 0.05. Two other common levels of significance are α = 0.1 and α = 0.01.
A level of significance of α = 0.01 is considered to be very low in statistical practice. It signifies that the statistician would like a lot of evidence in order to reject the null hypothesis. Another way of looking at this is to say that the statistician does not want to incorrectly reject the null hypothesis. They are saying:
If I am going to report to the public that I have rejected the null hypothesis, I want to be very, very sure that I am correct in doing so.
With a level of significance of α = 0.01, they are conducting a 99% hypothesis test. This means that, if the null hypothesis is actually true, 99% of the possible samples the statistician could collect would tell them not to reject the null hypothesis. So if the sample collected does tell the statistician to reject the null hypothesis, they can be fairly sure that they are correct in doing so.
Of course, the statistician might still be wrong. Tests, like estimation, have a level of uncertainty about them that is unavoidable. It is this uncertainty that a lot of this section is about. We will be looking at the types of errors that can occur in a hypothesis test, at the ways that a statistician can avoid them, and at how a statistician can improve their ability to draw conclusions from a test.