Consistency
Convergence
Consistency is about the convergence of estimators. Recall what convergence means for non-random numbers. Suppose $x_1, x_2, x_3, \cdots$ are non-random numbers. What is the meaning of $\lim\limits_{n \rightarrow \infty} x_n = x$?
For example, if $x_n = \frac{1}{n}$, $\lim\limits_{n \rightarrow \infty} x_n = 0$. If $x_n = (-1)^n$, $\lim\limits_{n \rightarrow \infty} x_n$ doesn’t exist.
Def: A sequence $x_n \in \mathbb{R}$ is said to converge
to $x$, denoted as $\lim\limits_{n \rightarrow \infty} x_n = x$ if for any fixed number $\epsilon > 0$, we have $|x_n - x| \leq \epsilon$ for all sufficiently large $n$. “For all sufficiently large $n$” means that there exists $N$ such that for all $n \geq N$.
Example
Suppose $Y_i \overset{i.i.d.}{\sim} N(\mu, \sigma^2)$. $\bar{Y}$ converges to $\mu$. We claim that $\bar{Y}_n \sim N(\mu, \frac{\sigma^2}{n})$. See this theorem for details.
No matter how large $n$ is, $\bar{Y}_n$ has a positive probability to exceed any fixed threshold (think about the bell-shaped curve). The good news is that
$$ \underbrace{MSE(\bar{Y}_n) = Var(\bar{Y}_n)}_{\text{unbiasedness}} = \frac{\sigma^2}{n} \rightarrow 0 \text{ as } n \rightarrow \infty. $$
Convergence in probability
Def: A sequence of random variables $X_n$ is said to converge in probability
to a constant $x$ if for any fixed $\epsilon > 0$,
$$
P\left( |X_n - x| \leq \epsilon \right) \rightarrow 1 \text{ as } n \rightarrow \infty.
$$
This is the same as
$$ P(x - \epsilon \leq X_n \leq x + \epsilon) \rightarrow 1, $$
or
$$ P\left( |X_n - X| > \epsilon \right) \rightarrow 0 \text{ as } n \rightarrow \infty. $$
The above (converge in probability to) can be denoted as
$$ X_n \xrightarrow{P} x $$
Consistency
Def: An estimator $\hat\theta_n$ is said to be consistent
if $\hat\theta_n \xrightarrow{P} \theta$ no matter which true $\theta \in \Theta$ is. Here $n$ denotes the sample size.
Normal distribution example
Suppose $Y_i \overset{i.i.d.}{\sim}N(\mu, \sigma^2)$. We know that $\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})$. Show the consistency of $\bar{Y}$.
$$ P(|\bar{Y}_n - \mu| \leq \epsilon) = P\left( \left| \frac{\bar{Y}_n - \mu}{\sigma / \sqrt{n}} \right| \leq \frac{\epsilon}{\sigma / \sqrt{n}} \right) $$
The above is the standardization
of a random variable. A fact here is that $Z = \frac{\bar{Y}_n - \mu}{\sigma / \sqrt{n}} \sim N(0, 1)$ because $E[\bar{Y}_n] = \mu$ and $s.e.(\bar{Y}_n) = \sigma / \sqrt{n}$.
$$ P(|\bar{Y}_n - \mu| \leq \epsilon) = P\left( |Z| \leq \frac{\epsilon}{\sigma/\sqrt{n}} \right) = P\left( |Z| \leq \frac{\epsilon}{\sigma}\sqrt{n} \right) $$
which is the area under $\left( -\frac{\epsilon}{\sigma}\sqrt{n}, \frac{\epsilon}{\sigma}\sqrt{n} \right)$ in the PDF of $N(0, 1)$. As $n \rightarrow \infty$, the boundaries get pushed further outside and eventually we get the area to be $1$.
Uniform distribution example
$Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. $\hat\theta_n = \max(Y_1, \cdots, Y_n)$. We want to show $\hat\theta_n$ is consistent.
We know that $\hat\theta_n \in [0, \theta]$. We can assume that $\epsilon \in (0, \theta)$ since $|\hat\theta_n - \theta| > \theta$ is impossible.
$$ \begin{aligned} P(|\hat\theta_n - \theta| > \epsilon) &= P(\theta - \hat\theta_n > \epsilon) \\ &= P(\hat\theta_n < \theta - \epsilon) \\ &= P(Y_1 < \theta - \epsilon, Y_2 < \theta - \epsilon, \cdots, Y_n < \theta - \epsilon) \\ &= P(Y_1 < \theta - \epsilon)^n \\ F_{Y_1}(y) &= \frac{y}{\theta}, 0 \leq y \leq 1 \\ P(|\hat\theta_n - \theta| > \epsilon) &= \left( \frac{\theta - \epsilon}{\theta} \right)^n \rightarrow 0 \text{ as } n \rightarrow \infty. \end{aligned} $$
Theorem
If $MSE(\hat\theta_n; \theta) \rightarrow 0$ as $n \rightarrow \infty$ $\forall \theta \in \Theta$, then $\hat\theta_n$ is consistent.
Lemma (Markov inequality): If random variable $X \geq 0$, then for any constant $k > 0$, we have $$ P(X > k) \leq \frac{1}{k}E[X]. $$
Assume $X$ is continuous with PDF $f(\cdot)$. The case for a discrete $X$ is similar.
$$ \begin{aligned} E[X] &= \int_0^\infty xf(x)dx \\ &\geq \int_k^\infty xf(x)dx \\ &\geq \int_k^\infty kf(x)dx \quad\cdots\text{ because } x \geq k, \\ &= k \int_k^\infty f(x)dx \\ &= kP(X > k) \\ \frac{1}{k}E[X] &\geq P(X > k) \end{aligned} $$
Now we move on to the proof of the theorem. Fix $\epsilon > 0$. Note that
$$ \begin{aligned} P(|\hat\theta_n - \theta| > \epsilon) &= P\left( (\hat\theta_n - \theta)^2 > \epsilon^2 \right) \\ &\leq \frac{1}{\epsilon^2} \underbrace{E\left[ (\hat\theta_n - \theta)^2 \right]}_{MSE \rightarrow 0} \rightarrow 0 \end{aligned} $$
Example using the MSE theorem
$Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. $\hat\theta_n = \max(Y_1, \cdots, Y_n)$. We want to show $\hat\theta_n$ is consistent.
We have the same setup as the uniform distribution example, but this time we want to apply the theorem. Recall that
$$ MSE(\hat\theta_n) = \frac{2\theta^2}{(n+1)(n+2)} \rightarrow 0 $$
as $n \rightarrow \infty$.
Law of large numbers
The LLN states that if $Y_i$ are i.i.d. with $E[Y_i] = \mu$ and $Var(Y_i) = \sigma^2 < \infty$, then $\bar{Y}_n \xrightarrow{P} \mu$.
Proof: $MSE(\bar{Y}_n) = Var(\bar{Y}_n) = \frac{\sigma^2}{n} \rightarrow 0$ as $n \rightarrow \infty$.
Properties of convergence in probability
Suppose random variables $X_n \xrightarrow{P} x$ and $Y_n \xrightarrow{P} y$.
- $X_n + Y_n \xrightarrow{P} x + y$.
- $X_n Y_n \xrightarrow{P} xy$.
- $X_n / Y_n \xrightarrow{P} x / y$ if $Y_n$ and $y \neq 0$.
- If $g$ is a continuous function, $g(X_n) \xrightarrow{P} g(x)$.
The proofs can be found in more advanced courses. An example of applying them would be $X_n \xrightarrow{P} x$, $Y_n \xrightarrow{P} y$ and $Z_n \xrightarrow{P} z$. We then have
$$ (X_n + Y_n)e^{Z_n} \xrightarrow{P} (x+y)e^z. $$
Think of this as $X_n’ = X_n + Y_n$ and $Y_n’ = e^{Z_n}$.
Example
Suppose $Y_i$ are i.i.d. with $E[Y_i] = \mu$, $Var(Y_i) = \sigma^2 < \infty$ (and $E[Y_i^4] < \infty$). Show that
$$ \hat\sigma_n^2 = \frac{1}{n-1}\sum_{i=1}^n(Y_i - \bar{Y}_n)^2 $$
is unbiased for $\sigma^2$.
The goal is to show $\hat\sigma_n^2$ is consistent for $\sigma^2$. There are several approaches. The straightforward method is to compute $MSE = Var$. This is nasty!
The second (easier) approach is to write
$$ \begin{aligned} \hat\sigma_n^2 &= \frac{1}{n-1}\left( \sum_{i=1}^n Y_i^2 - n\bar{Y}n^2 \right) \\ &= \frac{n}{n-1}\left( \frac{1}{n}\sum{i=1}^n Y_i^2 - \bar{Y}_n^2 \right) \ \end{aligned} $$
As $n \rightarrow \infty$, $\frac{n}{n-1} \rightarrow 1$, $\bar{Y}_n^2 \rightarrow \mu^2$ (LLN), and if we think of $Y_i^2$ as $X_i$,
$$ \frac{1}{n}\sum_{i=1}^n Y_i^2 \xrightarrow{LLN} E[Y_i^2] = \mu^2 + \sigma^2 $$
So we have $$ \hat\sigma_n^2 \rightarrow 1 \times (\mu^2 + \sigma^2 - \mu^2) = \sigma^2 $$
A note of caution is that the concept of consistency only tells you the convergence eventually. It doesn’t tell us how fast it’s happening. If it’s really slow, then it still might not be a good estimator.
Feb 02 | Optimal Unbiased Estimator | 5 min read |
Jan 30 | Sufficiency | 5 min read |
Jan 29 | Maximum Likelihood Estimator | 7 min read |
Jan 28 | The Method of Moments | 3 min read |
Jan 25 | Bias and Variance | 11 min read |