The Method of Moments
Def: Suppose $Y_1, \cdots, Y_n$ are i.i.d. samples. The $k$-th population moment
is
$$
\mu_k = E\left[ Y_i^k\right ].
$$
The $k$-th sample moment
is defined as
$$ m_k = \frac{1}{n}\sum_{i=1}^n Y_i^k, k = 1, 2, 3, \cdots $$
Remark: All moments $m_1, m_2, \cdots$ are statistics. If $E\left[ |Y_i|^k\right ] < \infty$, then by the law of large numbers,
$$ m_k = m_{k, n} = \frac{1}{n}\sum_{i=1}^n Y_i^k \xrightarrow{P} E\left[ Y_i^k\right ] = \mu_k $$
If our goal is to find the population moments, this is good enough. For example, $Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. Recall that $2\bar{Y}_n$ was proposed as an estimator of $\theta$. This time we use the method of moments to derive the estimator.
$$ \begin{aligned} \mu_1 &= E[Y_i] = \frac{\theta}{2} \ m_1 &= \frac{1}{n}\sum_{i=1}^n Y_i = \bar{Y}_n \end{aligned} $$
Letting $\mu_1 = m_1$, we get
$$ \frac{\theta}{2} = \bar{Y}_n \Rightarrow \hat\theta = 2\bar{Y}_n $$
Procedure
$Y_i$ are i.i.d. samples from a distribution with $r$ unknown parameters $\theta_1, \cdots, \theta_r$.
- Compute population moments.
$$ \begin{cases} \mu_1 = \mu_1(\theta_1, \theta_2, \cdots, \theta_r) = E\left[ Y_1\right ] \\ \mu_2 = E\left[ Y_1^2\right ] \\ \vdots \\ \mu_r = E\left[ Y_1^r\right ] \end{cases} $$
- Write down a system of equations in the following way:
$$ \begin{equation} \label{eq:sample-moments} \begin{cases} \mu_1(\theta_1, \cdots, \theta_r) = m_1 \\ \mu_2(\theta_1, \cdots, \theta_r) = m_2 \\ \vdots \\ \mu_r(\theta_1, \cdots, \theta_r) = m_r \end{cases} \end{equation} $$
- Solve $\eqref{eq:sample-moments}$ to get the estimators of $\theta_1, \cdots, \theta_r$. That is, express $\theta_1, \cdots, \theta_r$ in terms of statistics $m_1, \cdots, m_r$.
Remark: The method of moments often yield consistent estimators. The system of equations $\eqref{eq:sample-moments}$ can be rewritten in vector form as $$ \begin{gather*} \vec{\theta} = (\theta_1, \cdots, \theta_r), \qquad \vec{m} = (m_1, \cdots, m_r) \\ \vec{\mu}(\vec{\theta}) = \bigg( \mu_1(\theta_1, \cdots, \theta_r), \cdots, \mu_r(\theta_1, \cdots, \theta_r) \bigg) \\ \eqref{eq:sample-moments} \Leftrightarrow \vec{\mu}(\vec{\theta}) = \vec{m} \end{gather*} $$
So what does solving the system of equations actually mean? Suppose there exists an inverse of $\vec{\mu}$, denoted as $\vec{\mu}^{-1}$ continuous. Recall that by the law of large numbers, $m_k \xrightarrow{P} \mu_k$ as $n \rightarrow \infty$.
$$ \hat\theta = \vec{\mu}^{-1}(\vec{m}) = \vec{\mu}^{-1}((m_1, \cdots, m_r)) \xrightarrow{P} \vec{\mu}^{-1}(\mu_1, \cdots, \mu_r) = (\theta_1, \cdots, \theta_r) $$
As an example, suppose $Y_i \overset{i.i.d.}{\sim} Gamma(\alpha, \beta)$. We know that $E[Y_i] = \alpha\beta$ and $Var(Y_i) = \alpha\beta_2$. To find estimators for $\alpha$ and $\beta$, the population moments are
$$ \begin{gather*} \mu_1(\alpha, \beta) = \alpha\beta \\ \mu_2(\alpha, \beta) = Var(Y_i) + E[Y_i]^2 = \alpha\beta_2 + \alpha^2\beta_2 \end{gather*} $$
Set the system of equations as
$$ \begin{cases} m_1 = \alpha\beta \\ m_2 = \alpha^2\beta^2 + \alpha\beta_2 \end{cases} $$
Substitute the first equation into the second one and we get
$$ m_2 = m_1^2 + m_1\beta \Rightarrow \beta = \frac{m_2 - m_1^2}{m_1} $$
Putting this back into the first equation
$$ \alpha = \frac{m_1}{\beta} = \frac{m_1^2}{m_2 - m_1^2} $$
And we can get our estimates: $$ \begin{aligned} \hat\alpha &= \frac{m_1^2}{m_2 - m_1^2} \xrightarrow{P} \frac{\mu_1^2}{\mu_2 - \mu_1^2} = \alpha \\ \hat\beta &= \frac{m_2 - m_1^2}{m_1} \xrightarrow{P} \frac{\mu_2 - \mu_1^2}{\mu_1} = \beta \end{aligned} $$
This way of constructing estimators is fairly simple, but is not used much nowadays because we have better methods, as introduced in the next chapter.
Feb 02 | Optimal Unbiased Estimator | 5 min read |
Jan 30 | Sufficiency | 5 min read |
Jan 29 | Maximum Likelihood Estimator | 7 min read |
Jan 27 | Consistency | 6 min read |
Jan 25 | Bias and Variance | 11 min read |