N10: Hypothesis Testing: Framework & One-Sample Tests

Node N10 — Section 1

Why This Concept Exists

Confidence intervals (N8, N9) tell us a range of plausible values for a parameter. But sometimes the question is more specific: "Is the parameter equal to a particular value?" For example: "Does this batch of pills contain exactly 500mg of active ingredient?" "Has the mean wait time actually decreased?" "Is this coin fair?"

Hypothesis testing provides a formal decision-making framework for answering such questions with a quantified error rate. Unlike a CI, which always produces an interval, a hypothesis test yields a binary decision: reject or do not reject the null hypothesis. This decision is made while controlling the probability of a false positive (Type I error) at a pre-specified level \(\alpha\).

The framework is built around a six-step protocol that must be followed rigidly in exam answers. Deviating from the protocol is one of the most common reasons students lose marks — not because their mathematics is wrong, but because they fail to communicate their reasoning in the expected structure.

Leverage: Hypothesis testing is the single most important topic in the inference half of PTS2. It appears in every exam, typically worth 12-20 marks across multiple questions. The six-step framework, p-value computation, and Type I/II error concepts are all independently examinable. Mastery of N10 is essential for N11 (Power) and N12 (Two-Sample Tests).

Node N10 — Section 2

Prerequisites

Before engaging with this node, you must be comfortable with:

Standard normal and t-distributions (N6): You must be able to find critical values \(z_\alpha\) and \(t_{\nu, \alpha}\) from tables. You must understand how tail probabilities relate to these values.
Sampling distributions (N6): \(\bar{X} \sim N(\mu, \sigma^2/n)\) for normal data; \(T = \frac{\bar{X}-\mu}{S/\sqrt{n}} \sim t(n-1)\) when \(\sigma\) is unknown.
Binomial and its normal approximation: For proportion tests, you need \(\hat{p} \approx N\big(p, p(1-p)/n\big)\) for large \(n\).
Confidence intervals (N8-N9): The duality between CIs and hypothesis tests means that understanding one immediately gives you half of the other. If \(\mu_0\) is outside the 95% CI, you reject \(H_0: \mu = \mu_0\) at \(\alpha = 0.05\).
Algebraic manipulation: Rearranging inequalities and computing standardized test statistics.

Key concept: Null vs Alternative The null hypothesis \(H_0\) represents the status quo or the statement being challenged. The alternative \(H_a\) represents what we are trying to demonstrate. The test always begins by assuming \(H_0\) is true, and we only reject it if the data are sufficiently inconsistent with that assumption.

Node N10 — Section 3

Core Exposition

3.1 The Six-Step Protocol

Every hypothesis test in PTS2 should follow this exact structure. Examiners award marks step by step, so writing out all six steps is the safest approach.

Step 1: State the hypotheses Write \(H_0: \theta = \theta_0\) and \(H_a: \theta > \theta_0\), \(\theta < \theta_0\), or \(\theta \neq \theta_0\). Always define \(\theta\) in words (e.g., "where \(\mu\) is the population mean weight in grams"). Step 2: Choose the significance level \(\alpha\) Commonly given in the question (5%, 1%, 10%). This is the maximum acceptable probability of a Type I error (rejecting \(H_0\) when it is true). Step 3: Specify the test statistic and its distribution under \(H_0\) Identify whether you need a z-statistic (known \(\sigma\)), a t-statistic (unknown \(\sigma\)), or a proportion z-statistic. Write down the exact formula and distribution. Step 4: Determine the rejection region For \(H_a: \theta > \theta_0\): reject if the test statistic exceeds the upper-tail critical value. For \(H_a: \theta < \theta_0\): reject if the test statistic is below the lower-tail critical value. For \(H_a: \theta \neq \theta_0\): reject if the test statistic is in either tail (split \(\alpha\)). Step 5: Compute the observed value Substitute the sample data into the test statistic formula. Step 6: State the decision and conclusion Decision: "Reject \(H_0\)" or "Do not reject \(H_0\)." Never write "Accept \(H_0\)." Conclusion: A sentence in the context of the problem.

Exam warning: Never write "Accept \(H_0\)." Not rejecting means the data are consistent with \(H_0\), not that \(H_0\) has been proven true. Write "Do not reject \(H_0\)" or "There is insufficient evidence to reject \(H_0\)."

3.2 The z-Test (Known Variance)

When the population variance is known and data are normal (or \(n\) is large), the test statistic is:

Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0, 1) \quad \text{under } H_0: \mu = \mu_0

Alternative	Rejection Region	p-value
\(H_a: \mu > \mu_0\)	\(Z > z_\alpha\)	\(P(Z > z_{\text{obs}})\)
\(H_a: \mu < \mu_0\)	\(Z < -z_\alpha\)	\(P(Z < z_{\text{obs}})\)
\(H_a: \mu \neq \mu_0\)	\(\|Z\| > z_{\alpha/2}\)	\(2P(Z > \|z_{\text{obs}}\|)\)

3.3 The t-Test (Unknown Variance)

When \(\sigma\) is estimated by \(S\):

T = \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \sim t(n-1) \quad \text{under } H_0: \mu = \mu_0

All three cases (right-tailed, left-tailed, two-tailed) follow the same rejection-region logic as the z-test, but with t-critical values from the t-table instead of z-critical values.

3.4 The Proportion Test

For testing \(H_0: p = p_0\) with large \(n\):

Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \approx N(0, 1) \quad \text{under } H_0

Critical distinction: In the SE, we use \(p_0\) (the null value), not \(\hat{p}\) (the sample value). This is because the test statistic is evaluated under the assumption that \(H_0\) is true. This differs from a CI, where we use \(\hat{p}\) in the SE because we are not assuming any particular value of \(p\).

3.5 p-Values

The p-value is the probability, under \(H_0\), of observing a test statistic as extreme as or more extreme than what was observed. It tells you how surprising your data are if \(H_0\) were true.

Decision rule: If p-value \(\leq \alpha\), reject \(H_0\). If p-value \(> \alpha\), do not reject \(H_0\). Right-tailed: p-value \(= P(Z > z_{\text{obs}})\). Left-tailed: p-value \(= P(Z < z_{\text{obs}})\). Two-tailed: p-value \(= 2P(Z > |z_{\text{obs}}|)\).

3.6 Type I and Type II Errors

Type I error (\(\alpha\)): Reject \(H_0\) when \(H_0\) is actually true. (False positive.) Type II error (\(\beta\)): Do not reject \(H_0\) when \(H_a\) is actually true. (False negative.) Power (\(1 - \beta\)): Correctly reject \(H_0\) when \(H_a\) is true. (Covered in depth in N11.)

Type I Error (False Positive) Rejecting a true null. Probability = \(\alpha\). You set this in Step 2. Common example: convicting an innocent person.

Type II Error (False Negative) Failing to reject a false null. Probability = \(\beta\). Depends on how far the truth is from \(H_0\), sample size, and \(\alpha\). Example: acquitting a guilty person.

Node N10 — Section 4

Worked Examples

Example 1: One-Sample z-Test (Right-Tailed)

A pharmaceutical company claims the mean active ingredient in a tablet is 500 mg. A regulatory agency suspects the actual mean is higher than 500 mg. From known manufacturing data, \(\sigma = 2.0\) mg. A random sample of \(n = 30\) tablets has mean \(\bar{x} = 500.73\) mg.

At the 5% significance level, test whether the mean is greater than 500 mg.

Step 1: Hypotheses \(H_0: \mu = 500\), \quad \(H_a: \mu > 500\).
where \(\mu\) is the true mean active ingredient in mg.

Step 2: Significance level \(\alpha = 0.05\).

Step 3: Test statistic and distribution \[Z = \frac{\bar{X} - 500}{\sigma/\sqrt{n}} \sim N(0, 1) \quad \text{under } H_0\]

Step 4: Rejection region Right-tailed test: reject \(H_0\) if \(Z > z_{0.05}\).
From tables: \(z_{0.05} = 1.645\).
Reject \(H_0\) if \(Z > 1.645\).

Step 5: Observed value \[z_{\text{obs}} = \frac{500.73 - 500}{2.0 / \sqrt{30}} = \frac{0.73}{0.3651} = 1.999\]

Step 6: Decision and conclusion Since \(z_{\text{obs}} = 1.999 > 1.645\), we reject \(H_0\).

There is sufficient evidence at the 5% level to conclude that the mean active ingredient exceeds 500 mg.

p-value: \(P(Z > 1.999) = 1 - \Phi(1.999) \approx 1 - 0.9772 = 0.0228\).
Since 0.0228 < 0.05, we reject \(H_0\). (Same conclusion.)

Example 2: One-Sample t-Test (Two-Tailed)

A bolt manufacturer claims the mean diameter is 10.00 mm. A quality inspector takes a random sample of \(n = 9\) bolts and finds \(\bar{x} = 10.23\) mm and \(s = 0.48\) mm. Assume normality.

At the 1% significance level, test whether the mean diameter differs from 10.00 mm.

Step 1: Hypotheses \(H_0: \mu = 10.00\), \quad \(H_a: \mu \neq 10.00\).
where \(\mu\) is the true mean bolt diameter in mm.

Step 2: Significance level \(\alpha = 0.01\).

Step 3: Test statistic and distribution \[T = \frac{\bar{X} - 10.00}{S/\sqrt{n}} \sim t(8) \quad \text{under } H_0\]

Step 4: Rejection region Two-tailed: reject \(H_0\) if \(|T| > t_{8, 0.005}\).
From tables: \(t_{8, 0.005} = 3.355\).
Reject \(H_0\) if \(|T| > 3.355\).

Step 5: Observed value \[t_{\text{obs}} = \frac{10.23 - 10.00}{0.48 / \sqrt{9}} = \frac{0.23}{0.16} = 1.438\]

Step 6: Decision and conclusion Since \(|t_{\text{obs}}| = 1.438 < 3.355\), we do not reject \(H_0\).

There is insufficient evidence at the 1% level to conclude that the mean diameter differs from 10.00 mm. The observed deviation of 0.23 mm is consistent with random sampling variation.

Example 3: One-Sample Proportion Test

A political pollster claims that 40% of voters support a candidate. A researcher samples 200 voters and finds 68 supporters.

At the 5% level, test whether the true proportion differs from 40%.

Step 1: Hypotheses \(H_0: p = 0.40\), \quad \(H_a: p \neq 0.40\).

Step 2: Significance level \(\alpha = 0.05\).

Step 3: Test statistic \[Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \approx N(0, 1)\]

Step 4: Rejection region Two-tailed: reject if \(|Z| > z_{0.025} = 1.96\).

Step 5: Observed value \(\hat{p} = 68/200 = 0.34\).
SE under \(H_0\): \(\sqrt{0.40 \times 0.60 / 200} = \sqrt{0.0012} = 0.03464\).
\[z_{\text{obs}} = \frac{0.34 - 0.40}{0.03464} = \frac{-0.06}{0.03464} = -1.732\]

Step 6: Decision and conclusion Since \(|z_{\text{obs}}| = 1.732 < 1.96\), we do not reject \(H_0\).

There is insufficient evidence at the 5% level to conclude that the proportion differs from 40%. The observed 34% could plausibly arise from random sampling when the true proportion is 40%.

p-value: \(2P(Z > 1.732) \approx 2 \times 0.0416 = 0.0832 > 0.05\). Consistent.

Node N10 — Section 5

Pattern Recognition & Examiner Traps

Trap 1: Writing "Accept H0" The single most common linguistic error in hypothesis testing. Not rejecting \(H_0\) does NOT mean \(H_0\) is true — it only means the data are not sufficiently inconsistent with it. The phrase "accept \(H_0\)" implies proof of the null, which hypothesis testing never provides.

WRONG "We accept \(H_0\); the mean is 500 mg." This overstates the evidence. The mean might be 501 mg but with insufficient data to detect it.

RIGHT "Do not reject \(H_0\). There is insufficient evidence at the 5% level to conclude that the mean differs from 500 mg."

Trap 2: Using \(\hat{p}\) in the SE for a proportion test In hypothesis testing, the test statistic is computed under \(H_0\), so the SE must use the null value \(p_0\), not the sample proportion \(\hat{p}\). Using \(\hat{p}\) gives a different (incorrect) SE.

WRONG \(SE = \sqrt{0.34 \times 0.66 / 200} = 0.0336\) — uses \(\hat{p} = 0.34\) instead of \(p_0 = 0.40\).

RIGHT \(SE = \sqrt{0.40 \times 0.60 / 200} = 0.0346\) — uses \(p_0 = 0.40\) from \(H_0\). The test is conditional on \(H_0\) being true.

Trap 3: Using z when σ is unknown If the question gives the sample standard deviation \(s\) (not the population \(\sigma\)), you must use the t-distribution. Using \(z\) underestimates the uncertainty and inflates the false rejection rate.

WRONG Using \(z_{0.025} = 1.96\) when \(s\) is computed from the data, regardless of sample size.

RIGHT Use \(t_{n-1, \alpha/2}\) when \(\sigma\) is unknown. For small \(n\), the difference can be substantial: e.g., \(t_{8, 0.025} = 2.306\) vs \(z_{0.025} = 1.96\).

Trap 4: Forgetting to double the p-value in a two-tailed test In a two-tailed test, the p-value accounts for both tails. Students often report the one-tailed probability and forget to double it.

Examiner patterns:

"Test whether..." (no direction specified) — almost always a two-tailed test with \(H_a: \neq\).
"Test whether the mean has decreased / improved / increased" — one-tailed test. Read the direction carefully.
When the question says "using a 5% significance level" — \(\alpha = 0.05\) is given. When it says "test at the 1% level" — \(\alpha = 0.01\).
If given a p-value and asked to interpret: compare directly with \(\alpha\). If p < \(\alpha\), reject.
"Write a conclusion in the context of the problem" — must include the words "insufficient evidence" or "sufficient evidence" and reference the parameter and context.

Node N10 — Section 6

Connections

N10 connects throughout the inference half of PTS2:

← N6 (Sampling Distributions): Every test statistic is a standardized sampling distribution. The z-test exists because \(\bar{X}\) is normal; the t-test exists because \((\bar{X}-\mu)/(S/\sqrt{n})\) follows a t-distribution.
← N8-N9 (CIs): The inversion principle: a two-sided test at level \(\alpha\) rejects \(H_0: \theta = \theta_0\) if and only if \(\theta_0\) falls OUTSIDE the \((1-\alpha)\) CI. This duality means you can often check your hypothesis tests against your CIs (and vice versa).
→ N11 (Power): Power analysis extends the six-step framework by quantifying the probability of correctly rejecting \(H_0\) under a specific alternative. N10 gives you the mechanism; N11 tells you how well it works.
→ N12 (Two-Sample Tests): N12 extends the six-step protocol to two-sample settings. The structure (hypotheses, test stat, rejection region, compute, decide, conclude) is identical — only the formulas change.

Node N10 — Section 7

Summary Table

Test	\(H_0\)	Test Statistic	Distribution
z-test (mean, \(\sigma\) known)	\(\mu = \mu_0\)	\(\dfrac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}\)	\(N(0, 1)\)
t-test (mean, \(\sigma\) unknown)	\(\mu = \mu_0\)	\(\dfrac{\bar{X} - \mu_0}{S/\sqrt{n}}\)	\(t(n-1)\)
Proportion test	\(p = p_0\)	\(\dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}\)	\(\approx N(0, 1)\)

Never "Accept H0" Write "Do not reject \(H_0\)" or "There is insufficient evidence to reject \(H_0\)." Failing to reject is not the same as proving the null. This is a guaranteed mark-killer in PTS2 exams.

Six Steps, Every Time In exams, write out all six steps explicitly. Examiners award marks step by step. Skipping steps or combining them risks losing partial credit. Always define the parameter in words in Step 1.

p-value vs Critical Value Both methods always give the same decision. The p-value contains more information (you can compare against any \(\alpha\)), but the critical-value method is easier to execute in a timed exam.

Type I vs Type II <\(\alpha\) = P(reject \(H_0\) | \(H_0\) true) = false positive. \(\beta\) = P(do not reject | \(H_a\) true) = false negative. Decreasing \(\alpha\) increases \(\beta\) (and vice versa) for a fixed sample size.

Node N10 — Section 8

Self-Assessment

Test your understanding before moving to N11:

Can you do all of these?

Test \(H_0: \mu = 25\) vs \(H_a: \mu > 25\) with \(\bar{x} = 27\), \(\sigma = 4\), \(n = 36\), at \(\alpha = 0.05\). [Answer: \(z = 3.0 > 1.645\), reject \(H_0\), p-value = 0.0013.]
Test \(H_0: \mu = 100\) vs \(H_a: \mu \neq 100\) with \(\bar{x} = 96.3\), \(s = 7.2\), \(n = 16\), at \(\alpha = 0.01\). [Answer: \(t = -2.056\), \(|t| < 2.947\), do not reject.]
Test \(H_0: p = 0.30\) vs \(H_a: p < 0.30\) with \(X = 42\) successes in \(n = 200\), at \(\alpha = 0.05\). [Answer: \(\hat{p} = 0.21\), \(z = -2.76\), reject \(H_0\).]
Explain in words what a Type II error means in the context of a medical screening test for a disease. [Answer: The test fails to detect the disease in a patient who actually has it — a false negative.]
For \(H_0: \mu = 50\) vs \(H_a: \mu \neq 50\), suppose the 95% CI for \(\mu\) is [51.2, 55.8]. What is the decision at \(\alpha = 0.05\)? [Answer: Reject \(H_0\), since 50 is outside the CI.]
Identify which hypothesis test is appropriate: (a) \(n = 50\), \(\sigma = 3\) known, testing \(\mu\). (b) \(n = 10\), s from data, testing \(\mu\). (c) \(n = 300\), testing a proportion. [Answer: (a) z-test, (b) t-test, (c) proportion test.]

High-Leverage Questions

HLQ: Exam-Style Question with Worked Solution

16 MARKS Z-TEST + T-TEST + PROPORTION + ERRORS COMPREHENSIVE

A factory produces ball bearings with a target mean weight of 5.00 grams. The production process has a known standard deviation of \(\sigma = 0.12\) grams. A quality control engineer takes a random sample of 40 bearings.

(a) In a particular week, the sample mean was \(\bar{x} = 5.045\) grams. Test, at the 5% level, whether the mean weight has increased. (5 marks)

(b) Calculate the p-value for this test and interpret it. (3 marks)

(c) Suppose instead that \(\sigma\) was unknown and the sample standard deviation was found to be \(s = 0.14\) grams. How would the test procedure change? Carry out the test at the 5% level. (5 marks)

(d) A separate test of the proportion of defective bearings found 12 defectives in a sample of 200, testing \(H_0: p = 0.04\) vs \(H_a: p > 0.04\) at the 5% level. Carry out the test. (3 marks)

Part (a): One-Sample z-Test (Right-Tailed) Step 1: \(H_0: \mu = 5.00\) vs \(H_a: \mu > 5.00\), where \(\mu\) is the true mean weight in grams.
Step 2: \(\alpha = 0.05\).
Step 3: \(Z = \dfrac{\bar{X} - 5.00}{0.12/\sqrt{40}} \sim N(0, 1)\) under \(H_0\).
Step 4: Reject \(H_0\) if \(Z > z_{0.05} = 1.645\).
Step 5: \(z_{\text{obs}} = \dfrac{5.045 - 5.00}{0.12/\sqrt{40}} = \dfrac{0.045}{0.01897} = 2.372\).
Step 6: Since \(2.372 > 1.645\), reject \(H_0\). There is sufficient evidence at the 5% level that the mean weight has increased.

Part (b): p-value and Interpretation p-value = \(P(Z > 2.372) = 1 - \Phi(2.372) \approx 1 - 0.9911 = 0.0089\).

Interpretation: If the true mean were really 5.00 grams, there is only a 0.89% chance of observing a sample mean of 5.045 grams or larger in a sample of 40 bearings. This is very unlikely, providing strong evidence against \(H_0\).

Part (c): t-Test with Unknown σ Step 1: \(H_0: \mu = 5.00\) vs \(H_a: \mu > 5.00\).
Step 2: \(\alpha = 0.05\).
Step 3: \(T = \dfrac{\bar{X} - 5.00}{S/\sqrt{40}} \sim t(39)\) under \(H_0\).
Step 4: Reject \(H_0\) if \(T > t_{39, 0.05} \approx 1.685\).
Step 5: \(t_{\text{obs}} = \dfrac{5.045 - 5.00}{0.14/\sqrt{40}} = \dfrac{0.045}{0.02214} = 2.033\).
Step 6: Since \(2.033 > 1.685\), still reject \(H_0\).

Change observed: The critical value is larger (1.685 vs 1.645) and the observed statistic is smaller (2.033 vs 2.372) because \(s = 0.14 > 0.12 = \sigma\). The conclusion happens to be the same, but with less evidence.

Part (d): Proportion Test Step 1: \(H_0: p = 0.04\) vs \(H_a: p > 0.04\).
Step 2: \(\alpha = 0.05\).
Step 3: \(Z = \dfrac{\hat{p} - 0.04}{\sqrt{0.04(0.96)/200}} \approx N(0, 1)\).
Step 4: Reject if \(Z > 1.645\).
Step 5: \(\hat{p} = 12/200 = 0.06\). \(SE = \sqrt{0.04 \times 0.96/200} = \sqrt{0.000192} = 0.01386\).
\(z_{\text{obs}} = \dfrac{0.06 - 0.04}{0.01386} = 1.443\).
Step 6: Since \(1.443 < 1.645\), do not reject \(H_0\). Insufficient evidence at the 5% level that the defect rate exceeds 4%.

Summary of answers:
(a) Reject \(H_0\) (\(z = 2.372 > 1.645\)). Mean weight has significantly increased.
(b) p-value = 0.0089. Very strong evidence against \(H_0\).
(c) Using t-test with \(s = 0.14\): still reject (\(t = 2.033 > 1.685\)), but with weaker evidence.
(d) Do not reject \(H_0\) (\(z = 1.443 < 1.645\)). No evidence defect rate exceeds 4%.