Step-by-step guide to the P-value approach

The lower the P-value, the more unlikely and 'unexpected' the sample is, if we are assuming the null hypothesis is true. So if we collect a sample and calculate a very low P-value for it, we take this as evidence against the null hypothesis. If it is too low, we go ahead and reject the null hypothesis.

But how low is too low? This is where the methodology of hypothesis testing comes back in! When using P-values in a hypothesis test, we don't use critical values or regions of rejection. But we do still use a level of significance. In fact, the level of significance is very important to a P-value hypothesis test. In particular, if the P-value is below the level of significance, we reject the null hypothesis. If the P-value isn't below the level of significance, we don't reject the null hypothesis.

Here is the step-by-step guide to hypothesis testing using the P-value approach.

1. State the hypotheses

Just like for any hypothesis test, stating the null and alternative hypotheses will make it clear what claim is being tested, but it will also tell us whether the test is one-sided or two-sided. As we've just seen, this is important for the P-value! Earlier in this section we were looking at two hypothesis tests, and below we will reproduce the hypotheses for them.

An example of hypotheses for a one-sided test:

H0: μ = 300

HA: μ > 300

An example of hypotheses for a two-sided test:

H0: π = 0.65

HA: π ≠ 0.65

2. Assume the null hypothesis is true

The whole point of calculating a P-value is that it tells us how unlikely the sample is under the assumption that the null hypothesis is true. So this assumption is important here as always.

3. Choose a level of significance, α

As with any hypothesis test, α is a measure of how strong you need the evidence against the null hypothesis to be before you agree to reject it. The lower the α, the more evidence you need. When using the P-value method, the value of α will be your 'cut-off' point. That is, if the P-value for your sample is lower than α, you reject the null hypothesis.

Common levels of significance α are 0.1, 0.05 and 0.01.

4. Collect a sample and calculate a sample statistic

As with any hypothesis test, the sample is the thing you use as evidence against which you judge the null hypothesis. So, just like in any hypothesis test, you need to collect a sample and calculate a sample statistic so that it can be compared to the population parameter.

5. Calculate the test statistic

Again, the test statistic is a standard score for the sample statistic that indicates how different the sample statistic is to the null hypothesis. The test statistic is typically a z-score or t-score, and is calculated using the appropriate transformation formula.

6. Calculate the P-value

The test statistic will be a value in some probability distribution (typically Z or a t-distribution). The P-value is the chance that this probability distribution will assume a value as extreme as the observed test statistic. This chance can be determined by referring to the appropriate table or statistical software.

While the P-value is technically calculated in relation to the test statistic, we think of it as a measure of the likelihood (or unlikelihood as it were) of the sample that we've just observed.

As explained earlier, the P-value will be affected by whether the test is one-sided or two-sided.

7. Conclusion

The lower the P-value, the greater the evidence against the null hypothesis. If the P-value is below the level of significance, the null hypothesis is rejected. If the P-value is not below the level of significance, the null hypothesis is not rejected.

But it doesn't have to be black and white!

The point of the P-value method is that we don't simply reject or not reject the null hypothesis. We also come out of the test with a number, the P-value. Specifically, we come out of the test with a probability: the probability of observing the sample we just observed, if the null hypothesis is true. So even if a null hypothesis is not rejected, we have some measure of how strong the evidence against it is by the observed sample.

For example let's suppose that in both of the examples at the beginning of this section, a level of significance of α = 0.05 had been chosen at the start of the test. That is, the scientist in the caffeine study and the sociologist in the renewable energy study both chose this as their level of significance.

The caffeine study produced a P-value of 5.48%, or 0.0548. The renewable energy study produced a P-value of 2.08%, or 0.0208.

Comparing their respective P-values to their levels of significance, in the caffeine study the null hypothesis would not be rejected by the scientist (the P-value is not below the level of significance) and in the renewable energy study the null hypothesis would be rejected by the sociologist (the P-value is below the level of significance).

But in both studies, the evidence against the null hypothesis is quite strong. In particular, the scientist may still be suspicious of the caffeine levels in the new brand of coffee, even though she failed to prove (against a level of significance of 0.05) that the levels were above 300 mg.