One of the first things we do when we conduct a hypothesis test is to decide on a level of significance for the test.
Suppose a hypothesis test is to be conducted testing some null hypothesis H0, with an alternative hypothesis HA. Then the level of significance α for the test is the probability of rejecting the null hypothesis when it is true.
The above is the technical definition of the level of significance. But in a more practical sense, α plays a large role in our decision to reject or not reject the null hypothesis. In fact α can be thought of as an indicator of how much 'evidence' we require before we reject the null hypothesis. The level of significance, being a probability, is always between 0 and 1. The closer α is to 0, the 'harder' it will be to reject the null hypothesis. The closer α is to 1, the 'easier' it will be to reject the null hypothesis.
Levels of significance and confidence
You may recall that we defined levels of significance in Chapter 7 on estimation. In estimation, the level of significance can be thought of as the probability that your confidence interval will fail to cover the population parameter. In testing, the level of significance can be thought of as the probability that you will reject the proposed value for the population parameter when it is correct.
In either case, the level of significance is related to our 'confidence' in an interval or a test. Just as we saw in Chapter 7, here a level of significance α corresponds to a level of confidence 100% × (1 - α). So, for example, a 95% hypothesis test has a level of significance of 0.05.
An important concept to become familiar with is that at the beginning of a test, we can actually set α to be whatever we want. The level of significance then influences whether or not we reject the null hypothesis through things known as critical values and the region of rejection.
Conceptually, critical values for a hypothesis test are our 'boundary points' in the standard normal distribution, Z. And then the region of rejection is any part of Z that falls 'outside' these boundary points. The test statistic will then either fall inside the boundary points or in the region of rejection, and this will determine whether we reject the null hypothesis. This is the idea of a critical value and region of rejection, but the exact definitions of these things will depend on whether we have a one-sided or two-sided test. We will see critical values and regions of rejection in action by looking at two examples.
So far in this chapter, we've paid a lot of attention to Fred's two-sided test of the fairness of a coin. To recap, we have the null and alternative hypotheses:
H0: π = 0.5
HA: π ≠ 0.5
Level of significance. We've been implicitly using a level of significance of α = 0.05 so far in this chapter, because we've been looking at a '95% hypothesis test'. To get a more complete picture of how to run the test from the beginning, let's now suppose Fred sets the level of significance at something else: α = 0.01. (This means we are conducting a '99% hypothesis test'.)
Critical values. In a two-sided hypothesis test, there are two critical values, and they are defined similarly to the critical values in a confidence interval estimation. That is, if you use a level of significance α then the two critical values zα/2 and -zα/2 are z-scores defined such that (as a proportion) α/2 of Z falls above zα/2 and α/2 falls below -zα/2.
For example, with a level of significance α = 0.01, Fred's two critical values are z0.005 = 2.576 and -z0.005 = -2.576. These z-scores can be found using the standard normal table or statistical software, as discussed in Chapter 5.
Region of rejection. Once we have defined the critical values, the region of rejection is then the area in Z 'outside' these two critical values. That is, the region of rejection is the combination of two sets of values: the set of values greater than zα/2 and the set of values less than -zα/2. For example, in Fred's coin fairness test the region of rejection is the set of values above 2.576 and values below -2.576.
As the name suggests, the region of rejection tells us when to reject the null hypothesis. To be more precise, when we collect a sample we can calculate a sample statistic (like a sample proportion). For this sample statistic, we can calculate a test statistic. If the test statistic is in the region of rejection, we reject the null hypothesis. Otherwise, we do not reject the null hypothesis.
Earlier in this section we saw an example where a particular sample produced a sample proportion of p = 0.56. The test statistic for this sample proportion was calculated to be z = 2.4. This value is between -2.576 and 2.576, and so the null hypothesis is not rejected. That is, at a level of significance of α = 0.01, Fred does not reject the claim that the coin is fair.
Watch out!
The fact that Fred didn't reject the null hypothesis in this case was dependent on the level of significance, which was quite low in this example. Remember: the level of significance defines what is considered 'enough' evidence to reject the null hypothesis. The level of significance determines the critical values, the critical values then determine the region of rejection, and the region of rejection determines your decision at the end of the test.
For example, if Fred had kept a level of significance of α = 0.05, this would have produced critical values of -1.96 and 1.96. And a test statistic of z = 2.4 is in the region of rejection in that case.
The level of significance chosen for a test is obviously very important. And you can't change your mind about your level of significance halfway through the test! You have to decide on a level, and stick with it.
For example, suppose Fred set the level of significance to be α = 0.01, as explained in the above discussion. This leads to critical values of -2.576 and 2.576. The region of rejection is the area outside these two values. If, as we discussed, a sample of 400 coin flips produces a sample proportion of 0.56, this gives a test statistic of z = 2.4.
At this stage, Fred might notice that the test statistic is not in the region of rejection. He is forced to conclude that he doesn't have enough evidence to reject the null hypothesis that the coin is fair. And this might disappoint Fred. He might wish that he could turn back time and set the level of significance to be α = 0.05. This would have given critical values of -1.96 and 1.96. In that case the test statistic is in the region of rejection.
But Fred cannot do this. If Fred were allowed to set the level of significance again, this would defeat the purpose of the test. In fact, any test could result in a rejection if we were allowed to adjust the level of significance after seeing the test statistic.
The principles of level of significance, critical values and region of rejection are the same for one-sided hypothesis tests. However, the different alternative hypothesis does have an effect. Put simply, in a one-sided hypothesis test there is only one critical value, and the region of rejection only occurs on one side of this critical value (hence the name one-sided hypothesis test).
Let's return to a scenario we saw at the beginning of this chapter, where a consumer watchdog was testing the average life of a battery. In particular, the consumer watchdog was testing the claim that the batteries lasted an average of 60 hours. The null and alternative hypotheses in this test are:
H0: μ = 60
HA: μ < 60
This is a one-sided hypothesis test because the alternative hypothesis is specifically claiming that the population mean battery life is less than 60 hours. For the purposes of this example, let's suppose we know that the population standard deviation is σ = 5 hours, and that a sample of n = 100 batteries are to be tested.
Assuming the null hypothesis is correct, the sampling distribution of the mean, X, is approximately normal with mean 60 and standard deviation 5/√100 = 0.5.
Level of significance. Let's set the level of significance at α = 0.05.
Critical value. Earlier, in a two-sided hypothesis test, two critical values were defined such that a proportion of α falls 'outside' the critical values in Z (with α/2 above and α/2 below). The same principle is true here, however there is only one critical value. Whether the critical value is positive or negative will depend on the nature of the alternative hypothesis. In this example, the alternative hypothesis asserts that the population mean is below the proposed value of 60, so the critical value will be the negative -zα.
Notice that the subscript is α, not α/2. This is because the (single) critical value in a one-sided hypothesis test is defined so that α of the standard normal distribution lies to one side of it. So, for example, with a level of significance of α = 0.05, the critical value is -z0.05 = -1.645.
Region of rejection. The region of rejection is the area of Z, defined by the critical value, that contains a proportion of α of the standard normal distribution. If the critical value is negative, the region of rejection is to the left of it. If the critical value is positive, the region of rejection is to the right of it.
Below is a diagram showing the region of rejection for the battery test (with a level of significance of 0.05). A comparison to a two-sided region of rejection is shown next to it. In both a one-sided and a two-sided hypothesis test, the region of rejection will contain α of the standard normal distribution Z. The difference is that in a one-sided test, this section of Z is contained in one part of the distribution. In a two-sided test, it is split into two halves (containing α/2 each).
Just as with a two-sided test, the decision in the hypothesis test rests on whether the test statistic is in the region of rejection.
Once the level of significance is chosen and the critical value and region of rejection are determined, we can collect a sample. Let's suppose the watchdog collected a sample of 100 batteries and recorded a sample mean of x = 59.1 hours. The test statistic for this sample mean is the z-score of 59.1 in the sampling distribution of the mean:
| z | = |
|
= |
|
= | -1.8 |
So the test statistic is equal to -1.8, which is less than the critical value of -1.645. That is, the test statistic is in the region of rejection. Therefore, the null hypothesis is rejected. That is, the watchdog can reject the claim made by the battery company, and they can conclude that the average battery life is less than 60 hours.
While the exact methodology differs depending upon whether a hypothesis test is one-sided or two-sided, the principle is the same: the chosen level of significance determines a critical value (or two critical values), which in turn determine a region of rejection. This region of rejection then becomes a key component of the test. A sample is collected, and a test statistic is calculated. If the test statistic is in the region of rejection, the null hypothesis is rejected. If it is not in the region of rejection, the null hypothesis is not rejected.
It is important to know these steps and understand when, why and how they are completed. So, to finish this section off we will provide a complete step-by-step guide to conducting a hypothesis test.