Is σ known or unknown?

So far in this chapter, we have been relating sample statistics (like the sample mean) and their sampling distributions to the standard normal distribution, Z. We have been thinking about test statistics as being z-scores. We have also been thinking about critical values (the things we compare the test statistics to) as being z-scores. The region of rejection has always been described as a region of Z.

However, as with statistical estimation, it is not always appropriate to use Z in hypothesis testing for the population mean. The factor that determines this is whether or not the population standard deviation σ is known.

If σ is known

Remember, this σ is the standard deviation for X, the time taken for a sales routine. So at the moment we are supposing that the company knows this standard deviation, even though it doesn't know the population mean! This is not terribly realistic, but we'll ignore that for the moment.

Let's take Seller Door example. For the moment, let's suppose that the company happens to know that the population standard deviation in sales routine times is σ = 24 seconds. And let's say the company intends to collect a sample of n = 36 time values.

What is μ0?

This is our first use of the notation μ0. The subscript is meant to remind you of the null hypothesis, H0. In fact, whenever we are talking about a general value that has been proposed for μ in the null hypothesis, we will call it μ0. So, in general, a null hypothesis can be written out as:

H0: μ = μ0

Assuming the null hypothesis is true (that is, assuming that the population mean of X is μ0 = 620) we can say that the sampling distribution of the mean, X, approximately follows the normal distribution with mean 620 and standard deviation 24/√36 = 4 seconds. In other words,

X - μ0
σ/√n

approximately follows the standard normal distribution, Z.

This version of the transformation formula has been extremely important thus far in the hypothesis testing we've seen. For example, it is how we calculate the test statistic when a sample is collected. And it is the reason that we take critical values in Z when we choose a level of significance.

If σ is unknown

But what if Seller Door doesn't know the population standard deviation, σ? Just as with estimation, we have to acknowledge that this situation is much more likely. If the population standard deviation is unknown, it is estimated with the sample standard deviation, s. Recall that the sample standard deviation is a sample statistic from the sampling distribution S. And in Chapter 7 we learned about a slightly different transformation formula involving t-distributions.

Assuming the null hypothesis is true and that the population mean of X is μ0 = 620, then:

X - μ0
S/√n

approximately follows the t-distribution with (n - 1) degrees of freedom.

And so, if σ is unknown, this version of the transformation formula is used to calculate the test statistic. Also, the critical values (and therefore region of rejection) come from the appropriate t-distribution. In principle, the method for conducting the hypothesis test is the same. However, the appropriate distribution must be kept in mind and used when conducting the test.

What we will do now is go through the test twice: first assuming that the standard deviation is known to be σ = 24 seconds, and then not making this assumption. Other than this, we will leave everything about the two tests the same. Seller Door will use the same data and make the same decisions in both versions of the test so that we can see exactly how the two tests do differ.