Using bigger samples

But we don't want either error!

Of course, no one likes committing errors. Ideally, we would like both α and β to be as close to 0 as possible. The fact that they are inversely proportional is annoying because it means that if you decrease one of them then you increase the other, at a fixed sample size. But therein lies the answer: sample size! We've always talked about how samples behave 'nicely', and the bigger they are, the nicer they are. This is true in confidence interval estimation (larger samples imply more precise estimates) and it is true in testing.

Increasing the sample size, n, will allow you to decrease α, or β, or both.

Type I and Type II errors for sample size 200

Why are results like this true? The short answer is: the bigger the sample is, the more information you have, and the 'better' your tests are at describing the population. For a more detailed reasoning, consider the diagrams to the right and below. This is a continuation of the I.Q. example we were just looking at, where the null hypothesis stated that the average I.Q. is 100. For the purposes of studying errors we were considering the possibility that this null hypothesis is wrong and that the actual average is 110.

The two bell curves shown above are sampling distributions for a sample of size 200. The first bell curve represents the sampling distribution under the null hypothesis (that the mean is 100) and the second bell curve is the sampling distribution when the mean is actually 110. When conducting the test, our level of significance α will produce a critical value that marks off the region of rejection.

Notice that if we get a test statistic to the right of the critical value, we will reject the null hypothesis, and if we get a test statistic to the left, we will not reject the null hypothesis. Depending on whether or not the null hypothesis is actually true, these conclusions may be correct or they may be errors. This is why the area to the right of the critical value is considered an error (a Type I error) in the first bell curve (as it represents the sampling distribution when the null hypothesis is true). The area to the left of the critical value is considered an error (a Type II error) in the second bell curve (as it represents the sampling distribution when the average is actually 110).

Type I and Type II errors for sample size 800

Now suppose that, instead of collecting a sample of size 200, a sample of size 800 was collected. The two sampling distributions under this scenario are presented to the right. The difference between these bell curves and the two bell curves presented above is that the standard deviations have shrunk in the sampling distributions, because the sample size has increased. As a result the same critical value corresponds to a smaller level of significance (α) and a smaller β. That is, the chances of committing either a Type I or a Type II error have shrunk.

Sample size in testing

The rule that α and β are inversely proportional is only true at a fixed sample size.

By using a larger sample size in a hypothesis test, you can either:

Of course, nothing comes for free.

survey

Whenever a statistician, polling institute, or anybody wanting to conduct statistical inference collects a sample, they always want to be able to choose the biggest sample possible. However, larger samples cost more money, time and effort to collect and study. Remember - if we had all the time in the world we would probably study the population itself! There is a limit on what is reasonable, and so there are practical limitations (in relation to cost and time) to how much we can reduce errors by increasing sample size.