If someone is running a hypothesis test, they probably would like to be able to draw conclusions from that test. The strongest conclusion that can be drawn is that the sample has provided enough evidence to reject the null hypothesis. Indeed many times in practice, a statistician (or the decision-maker they represent) will want to reject the null hypothesis, as that is the result they are after.
In any case, if the null hypothesis is not rejected, all that we can conclude is that there was not enough evidence to reject it. So, in a way, the power of a test can be thought of as its ability to correctly draw conclusions, and specifically its ability to correctly conclude that the null hypothesis is not true when it is, in fact, not true.
We've just used the word 'power' in a common sense. However, the term has a technical meaning in hypothesis testing - but it pretty much means what we just said!
The power of a hypothesis test is the probability of not committing a Type II error, and is denoted (1 - β). That is, the power of a test is the probability of rejecting the null hypothesis when it is false (and therefore should be rejected).
The more powerful a test is, the greater your ability to draw a conclusion from it. However, the power of a test is somewhat out of your hands! Just as we can't completely control β, we cannot control the power of a test. We cannot determine exactly what it is, but we can influence it.
Earlier in this section, we discussed that the likelihood of committing a Type II error depended on what the true population parameter actually is and how different it is from the value proposed in the null hypothesis. If the true value is very close to the proposed value, we run a large chance of getting a Type II error (not rejecting the null hypothesis, even though it is false).
Well, conversely, the power of a test will not be very big if the true value is similar to the proposed value. Put simply, the test will not be very good at distinguishing between reality itself and the proposed reality. So one of the things influencing the power of a test is the difference between the actual value of the population parameter and the value proposed in the null hypothesis.
Watch out!
You may have guessed that one way to increase the power of a test is to deliberately propose an extreme value for the population parameter. For example, consider the I.Q. hypothesis test we were considering earlier in this section.
We were discussing the fact that if, for example, the true population mean I.Q. was 101 whereas the value proposed in the null hypothesis is 100, then this test has a relatively large β. Conversely, this test won't have much power. This makes sense: since the value being proposed is so close to the true value, it's going to be hard for a sample to distinguish the reality from the (not too different) proposed reality.
So, you might argue, let's make the null hypothesis propose that the mean I.Q. is only 10! Surely that will increase the power of the test! Well, yes and no. Yes, if you ran the following hypothesis test:
H0: μ = 10
HA: μ > 10
then your test would be, strictly speaking, powerful. Supposing the actual average I.Q. is (around) 101, then any sample you collect will probably have a mean of around 101, which would definitely allow you to reject the above null hypothesis. But so what? You can conclude that the mean I.Q. is not 10. While the test itself is powerful, this is not a terribly powerful result. So, it isn't necessarily a good idea to increase the power of your test by simply changing your null hypothesis.
Having said this, in practice this can sometimes be used. For example, suppose a scientist is studying the radiation levels in a particular town. They suspect that the average annual radiation level is greater than 200 rad. So an option could be to test the null hypothesis that the radiation level is 200 rad. However, the scientist knows that average annual radiation levels above 150 rad will be enough to provoke a national response, because a level of 150 rad is considered high. So, to increase the power of their test, the scientist decides to use the following null and alternative hypotheses:
H0: μ = 150
HA: μ > 150
There are two main ways that you can increase the power of a test. They are both trade-offs and a statistician is likely to use a combination of the following two methods.
First, as we have discussed, at a fixed sample size there is an inverse relationship between α and β. So increasing the level of significance will decrease β and therefore increase the power of a test. So, put simply, you can make your test more powerful by raising the level of significance and demanding less evidence to reject the null hypothesis. (Remember: the higher α is, the less evidence you are demanding before you reject the null hypothesis.) This of course will increase the chance of committing a Type I error. A statistician will need to consider the relative importance of each error type and find a balance between the two in a test.
Second, as we have also discussed, increasing the sample size can decrease β. So increasing the sample size will can increase the power of a test. The benefit of this approach is that the level of significance does not have to be increased to do this. However, as we recently discussed, increasing the sample size comes at a tangible cost in money, time and resources.
Increasing the power of a test
A health scientist is trying to determine if a particular population has a high diabetes risk. She would like to test the average blood glucose level against 100mg/dl (considered a normal level). If it is found that the population is at great risk, programs will be implemented to increase awareness of diabetes.
The health scientist typically uses a level of significance of α = 0.05 and a sample size of n = 500. However, given the importance of the test, she would like to increase its power. The scientist manages to secure funding to collect a sample of size 1,000. Also, the scientist sacrifices some confidence in the test by increasing the level of significance to 0.1.
In the above example, the scientist has used a combination of both methods of increasing the power of her test. By increasing the size of her sample she has increased the power of the test, but this comes at a price: a test of this size requires more funding. By increasing the level of significance in the test she has also increased the power of the test, but this comes at a price too. In particular, she has increased the probability of concluding that there is a diabetes risk in the population, even if there isn't actually such a risk. She has increased the probability of committing a Type I error.
This last point deserves special mention: just because a test is more powerful, this doesn't mean that it is necessarily 'better' in an absolute sense. Indeed, as we discussed earlier, a statistician will have to judge the relative danger of the two different types of errors they can commit in a hypothesis test. The scientist in the above example may have judged that implementing an awareness campaign when the population is not actually at great risk of diabetes is a relatively minor danger. Making this judgement is part of what allowed her to make her test a more powerful one.
The point of this section is that you should not conduct statistical methods blindly. The previous section of this chapter gave the most comprehensive guide you will actually need to conduct a test. However, it is important to know the decisions you make before and during your hypothesis test. In practice, two big decisions you will make are about the size of the sample and the level of significance of your test. This section gives an indication as to the sorts of things a statistician might think about when making these decisions.