Why This Concept Exists
Point estimation is the bridge between probability theory and statistical practice. While probability theory asks "given a parameter, what is the distribution of the data?", estimation asks the inverse: "given the data, what is the best guess for the parameter?"
This node covers the two dominant methods for constructing estimators at the PTS2 level:
- Method of Moments (MOM): The older, simpler method. Equate population moments to sample moments and solve. Fast, intuitive, but often suboptimal.
- Maximum Likelihood Estimation (MLE): The gold standard. Find the parameter value that makes the observed data most probable. Optimal in large samples, but requires calculus and careful reasoning.
Examiners love MLE because the derivation is standardised, the answer is unique, and checking the work is straightforward. They also love to test edge cases (discrete parameters, boundary solutions, non-differentiable likelihoods) that separate students who understand the concept from those who memorise the recipe.
Prerequisites
Before engaging with this node, you should be comfortable with:
- Population moments: \(E[X^k]\) for \(k = 1, 2, \ldots\). The first moment is the mean, the second central moment is the variance.
- Sample moments: \(\bar{X} = \frac{1}{n}\sum X_i\) (first sample moment), \(m_2 = \frac{1}{n}\sum X_i^2\) (second sample raw moment). Know the difference between raw and central moments.
- Differentiation: Finding maxima of functions by setting derivatives to zero, second derivative test, chain rule, product rule.
- Logarithms: \(\ln(a^b) = b\ln(a)\), \(\ln(ab) = \ln(a) + \ln(b)\). The log-likelihood uses these properties to convert products into sums.
- Product notation: The likelihood is typically \(\prod_{i=1}^{n} f(x_i;\theta)\), so you need to be comfortable with manipulating products.
- Basic discrete distributions: Binomial, Poisson, Geometric — their PMFs, means, and variances.
- Basic continuous distributions: Uniform, Exponential, Normal — their PDFs, means, and variances.
Core Exposition
3.1 What Is an Estimator?
An estimator is a function of the sample data used to guess an unknown population parameter \(\theta\). Formally: if \(X_1, \ldots, X_n\) is a random sample, an estimator is \(\hat{\theta} = g(X_1, \ldots, X_n)\).
Properties we desire in an estimator:
Consistency: \(\hat{\theta} \xrightarrow{P} \theta\) as \(n \to \infty\).
Efficiency: Among unbiased estimators, the one with the smallest variance is most efficient.
Mean Squared Error: \(\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2\).
3.2 Method of Moments (MOM)
The method of moments is conceptually simple:
1. Express the first \(k\) population moments \(E[X^j]\) as functions of the unknown parameter(s).
2. Compute the corresponding sample moments \(m_j = \frac{1}{n}\sum X_i^j\).
3. Set population moments equal to sample moments: \(E[X^j] = m_j\) for \(j = 1, \ldots, k\).
4. Solve the resulting system of equations for the parameter(s).
3.3 Maximum Likelihood Estimation (MLE)
The MLE finds the parameter value that maximises the probability (or density) of the observed data:
1. Write the likelihood function: \(L(\theta) = \prod_{i=1}^{n} f(x_i; \theta)\) (continuous case) or \(\prod_{i=1}^{n} P(X = x_i; \theta)\) (discrete case).
2. Take the log-likelihood: \(\ell(\theta) = \ln L(\theta) = \sum_{i=1}^{n} \ln f(x_i; \theta)\).
3. Differentiate: \(\dfrac{d\ell}{d\theta}\), set equal to zero, solve for \(\theta\).
4. Verify it's a maximum: Check \(\dfrac{d^2\ell}{d\theta^2} < 0\) at the solution.
5. Check boundary cases: if the MLE from calculus falls outside the parameter space, or if the likelihood is monotone, the MLE may be at a boundary.
3.4 Properties of MLE (Large Sample Results)
Asymptotic Normality: \(\sqrt{n}(\hat{\theta}_{\text{MLE}} - \theta) \xrightarrow{d} N\!\left(0, \dfrac{1}{I(\theta)}\right)\).
Asymptotic Efficiency: MLE asymptotically achieves the Cramér-Rao lower bound.
Invariance: If \(\hat{\theta}\) is the MLE of \(\theta\), then \(g(\hat{\theta})\) is the MLE of \(g(\theta)\) for any function \(g\).
where the Fisher information is:
3.5 The Score and Fisher Information
The score function is the derivative of the log-likelihood:
The MLE solves \(U(\hat{\theta}) = 0\).
The expected Fisher information: \(I(\theta) = -E\!\left[\dfrac{d^2\ell}{d\theta^2}\right]\).
For an i.i.d. sample: \(I_n(\theta) = n \cdot I_1(\theta)\) where \(I_1\) is the information from one observation.
3.6 Comparing MOM and MLE
| MOM Advantages Fast, no calculus needed, closed-form even for complex distributions. | MOM Disadvantages Can be biased, inefficient, sometimes outside the parameter space. |
| MLE Advantages Consistent, asymptotically normal and efficient, invariant under transformation. | MLE Disadvantages Requires differentiation, may not have closed form, sensitive to boundary issues. |
Worked Examples
Example 1: MOM and MLE for Exponential Distribution
Let \(X_1, \ldots, X_n\) be i.i.d. from the exponential distribution with PDF \(f(x; \lambda) = \lambda e^{-\lambda x}\) for \(x > 0\). Find both MOM and MLE estimators of \(\lambda\).
Sample mean: \(\bar{X} = \frac{1}{n}\sum X_i\).
Equate: \(1/\lambda = \bar{X}\).
Therefore: \(\hat{\lambda}_{\text{MOM}} = \dfrac{1}{\bar{X}}\).
Bias check: Since \(\bar{X} \sim \text{Gamma}(n, n\lambda)\), we have \(E[1/\bar{X}] = \frac{n}{n-1}\cdot\lambda\), so \(\hat{\lambda}_{\text{MOM}}\) is biased (overestimates), but asymptotically unbiased as \(n \to \infty\).
\(\dfrac{d\ell}{d\lambda} = \dfrac{n}{\lambda} - n\bar{X} = 0\)
Therefore: \(\hat{\lambda}_{\text{MLE}} = \dfrac{1}{\bar{X}}\).
Second derivative test: \(\dfrac{d^2\ell}{d\lambda^2} = -\dfrac{n}{\lambda^2} < 0\) for all \(\lambda > 0\). \(\checkmark\) Maximum confirmed.
Example 2: MLE for Normal Distribution — Unknown Mean and Variance
Let \(X_1, \ldots, X_n\) be i.i.d. \(N(\mu, \sigma^2)\) where both \(\mu\) and \(\sigma^2\) are unknown. Find the MLEs of \(\mu\) and \(\sigma^2\).
\(= (2\pi\sigma^2)^{-n/2} \exp\!\left(-\dfrac{1}{2\sigma^2}\sum(X_i - \mu)^2\right)\)
\(\displaystyle \sum X_i - n\mu = 0 \quad \Rightarrow \quad \hat{\mu}_{\text{MLE}} = \bar{X}\) \(\checkmark\)
\(\dfrac{n}{2\sigma^2} = \dfrac{1}{2(\sigma^2)^2}\sum(X_i - \mu)^2\)
\(\hat{\sigma}^2_{\text{MLE}} = \dfrac{1}{n}\sum_{i=1}^{n}(X_i - \hat{\mu})^2 = \dfrac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2\)
Example 3: MLE for a Uniform Distribution — The Boundary Case
Let \(X_1, \ldots, X_n\) be i.i.d. from \(U(0, \theta)\). Find the MLE of \(\theta\).
\(L(\theta) = \prod_{i=1}^{n} \dfrac{1}{\theta} \cdot \mathbb{I}(0 \leq X_i \leq \theta) = \dfrac{1}{\theta^n} \cdot \mathbb{I}(\theta \geq \max X_i)\)
where \(\mathbb{I}(\cdot)\) is the indicator function. The likelihood is zero if \(\theta\) is less than any observed value.
The log-likelihood is strictly decreasing in \(\theta\), but \(\theta\) cannot go to infinity — it must be at least as large as the largest observation.
Therefore: \(\hat{\theta}_{\text{MLE}} = \max\{X_1, \ldots, X_n\} = X_{(n)}\).
This is an order statistic! (connecting to N5.)
Example 4: MOM for a Two-Parameter Distribution
Let \(X_1, \ldots, X_n\) be i.i.d. from the Gamma distribution with shape \(\alpha\) and rate \(\beta\), with \(E[X] = \alpha/\beta\) and \(\text{Var}(X) = \alpha/\beta^2\). Find the MOM estimators of \(\alpha\) and \(\beta\).
From equation 1: \(\alpha = \beta\bar{X}\).
Substitute into equation 2: \(\dfrac{\beta\bar{X}}{\beta^2} = \dfrac{\bar{X}}{\beta} = S_n^2\).
Therefore: \(\hat{\beta}_{\text{MOM}} = \dfrac{\bar{X}}{S_n^2}\), and \(\hat{\alpha}_{\text{MOM}} = \beta\bar{X} = \dfrac{\bar{X}^2}{S_n^2}\).
Pattern Recognition & Examiner Traps
- "Show that the MLE of θ is ..." — follow the standard MLE recipe. Write L, take log, differentiate, solve.
- "Find the MOM estimator of θ
- "Is the MLE biased?" — compute E[MLE and check if E[\hat{\theta}] = \theta. If not, compute the bias: E[\hat{\theta}] - \theta.
- "Find the MLE of g(\theta)" — find MLE of \theta first, then apply invariance: g(\hat{\theta}).
- "Compare MOM and MLE" — check if same? Different? Which has smaller variance?
Connections
- ← N6 (Sampling Distributions): The properties of estimators (bias, variance, efficiency) require knowledge of sampling distributions. The MLE for normal parameters relies on the fact that \(\bar{X} \sim N(\mu, \sigma^2/n)\) and \((n-1)S^2/\sigma^2 \sim \chi^2(n-1)\).
- → N8 (Confidence Intervals): MLEs are used to construct confidence intervals via the pivotal quantity approach. The asymptotic normality of MLE provides another CI method.
- → N10-N12 (Hypothesis Testing): The likelihood ratio test — the most powerful general purpose test — uses the MLEs under both the null and alternative hypotheses.
Summary Table
| Method | Principle | Steps | Properties | When to Use |
|---|---|---|---|---|
| MOM | Match population to sample moments | Set \(E[X^k] = m_k\), solve | Consistent, simple, can be biased/inefficient | Quick estimates, complex distributions |
| MLE | Maximise likelihood of observed data | Write \(L(\theta)\), log, differentiate, solve | Consistent, asymptotically normal & efficient, may be biased in small samples | Default method, exams, inference |
| Uniform MLE | Boundary solution | Not differential — use support constraint |
Self-Assessment
Test your understanding before moving to N8:
- Derive the MLE for \(\lambda\) in a Poisson(\(\lambda\)) distribution. [Answer: \(\hat{\lambda} = \bar{X}\).]
- Derive the MLE for \(p\) in a Geometric(\(p\)) distribution. [Answer: \(\hat{p} = 1/\bar{X}\).]
- Find the MOM estimator for \(\theta\) in the distribution \(f(x) = (\theta+1)x^\theta\) on \([0,1]\).
- Show that for the normal distribution, \(\bar{X}\) and \(S^2\) are unbiased for \(\mu\) and \(\sigma^2\) respectively.
- Explain why the MLE of \(\sigma^2\) for a normal distribution is biased, but the MLE of \(\mu\) is unbiased.
- Given the MLE \(\hat{\lambda}\), find the MLE of \(e^{\lambda}\) using invariance.
- Let \(X_1, \ldots, X_n\) be i.i.d. from a Pareto distribution: \(f(x;\theta) = \theta/x^{\theta+1}\) for \(x > 1\). Find the MLE of \(\theta\). [Answer: \(\hat{\theta}_{\text{MLE}} = n/\sum \ln X_i\).]
- For a Binomial(\(n, p\)) sample, find the MOM estimator of \(p\). [Answer: \(\hat{p} = \bar{X}/n\).]
- True or false: If \(\hat{\theta}\) is the MLE of \(\theta\), then \(\hat{\theta}/2\) is the MLE of \(\theta/2\). [Answer: True, by invariance.]
HLQ: Exam-Style Question with Worked Solution
A random sample \(X_1, \ldots, X_n\) is drawn from the distribution with PDF:
(a) Find the Method of Moments estimator of \(\theta\). (3 marks)
(b) Find the Maximum Likelihood Estimator of \(\theta\). (4 marks)
(c) Is the MLE unbiased? Justify your answer. (3 marks)
(d) Find the MLE of \(\theta^2\). (2 marks)
Set equal to the sample mean:\ \(\dfrac{\theta}{\theta + 1} = \bar{X}\).
Solve for \(\theta\):
\(\theta = \bar{X}(\theta + 1) = \bar{X}\theta + \bar{X}\).
\(\theta(1 - \bar{X}) = \bar{X}\).
\(\hat{\theta}_{\text{MOM}} = \dfrac{\bar{X}}{1 - \bar{X}}\).
\(\ell(\theta) = n\ln\theta + (\theta - 1)\sum_{i=1}^{n} \ln X_i\).
\(\dfrac{d\ell}{d\theta} = \dfrac{n}{\theta} + \sum_{i=1}^{n} \ln X_i = 0\).
\(\hat{\theta}_{\text{MLE}} = \dfrac{-n}{\sum_{i=1}^{n} \ln X_i}\).
Note: Since \(0 < X_i < 1\), we have \(\ln X_i < 0\), so \(\sum \ln X_i < 0\) and the MLE is positive. \(\checkmark\)
Check second derivative:\ \(\dfrac{d^2\ell}{d\theta^2} = -\dfrac{n}{\theta^2} < 0\) for all \(\theta > 0\). \(\checkmark\) Maximum confirmed.
So \(Y_i \sim \text{Exp}(\theta)\), and \(\sum Y_i \sim \text{Gamma}(n, \theta)\).
\(\hat{\theta}_{\text{MLE}} = \dfrac{n}{\sum Y_i}\).
\(E\!\left[\dfrac{1}{\sum Y_i}\right] = \dfrac{\theta}{n-1}\) (using the known result for the reciprocal of a Gamma variable).
So \(E[\hat{\theta}_{\text{MLE}}] = n \cdot \dfrac{\theta}{n-1} = \dfrac{n}{n-1} \cdot \theta \neq \theta\).
The MLE is biased, but asymptotically unbiased since \(\frac{n}{n-1} \to 1\) as \(n \to \infty\).