# The Method of Moments

**Def:** Suppose $Y_1, \cdots, Y_n$ are i.i.d. samples. The $k$-th `population moment`

is
$$
\mu_k = E\left[ Y_i^k\right ].
$$

The $k$-th `sample moment`

is defined as

$$ m_k = \frac{1}{n}\sum_{i=1}^n Y_i^k, k = 1, 2, 3, \cdots $$

**Remark:** All moments $m_1, m_2, \cdots$ are statistics. If $E\left[ |Y_i|^k\right ] < \infty$, then by the law of large numbers,

$$ m_k = m_{k, n} = \frac{1}{n}\sum_{i=1}^n Y_i^k \xrightarrow{P} E\left[ Y_i^k\right ] = \mu_k $$

If our goal is to find the population moments, this is good enough. For example, $Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. Recall that $2\bar{Y}_n$ was proposed as an estimator of $\theta$. This time we use the method of moments to derive the estimator.

$$
\begin{aligned}
\mu_1 &= E[Y_i] = \frac{\theta}{2} \

m_1 &= \frac{1}{n}\sum_{i=1}^n Y_i = \bar{Y}_n
\end{aligned}
$$

Letting $\mu_1 = m_1$, we get

$$ \frac{\theta}{2} = \bar{Y}_n \Rightarrow \hat\theta = 2\bar{Y}_n $$

## Procedure

$Y_i$ are i.i.d. samples from a distribution with $r$ unknown parameters $\theta_1, \cdots, \theta_r$.

- Compute population moments.

$$
\begin{cases}
\mu_1 = \mu_1(\theta_1, \theta_2, \cdots, \theta_r) = E\left[ Y_1\right ] \\

\mu_2 = E\left[ Y_1^2\right ] \\

\vdots \\

\mu_r = E\left[ Y_1^r\right ]
\end{cases}
$$

- Write down a system of equations in the following way:

$$
\begin{equation} \label{eq:sample-moments}
\begin{cases}
\mu_1(\theta_1, \cdots, \theta_r) = m_1 \\

\mu_2(\theta_1, \cdots, \theta_r) = m_2 \\

\vdots \\

\mu_r(\theta_1, \cdots, \theta_r) = m_r
\end{cases}
\end{equation}
$$

- Solve $\eqref{eq:sample-moments}$ to get the estimators of $\theta_1, \cdots, \theta_r$. That is, express $\theta_1, \cdots, \theta_r$ in terms of statistics $m_1, \cdots, m_r$.

**Remark:** The method of moments often yield *consistent* estimators. The system of equations $\eqref{eq:sample-moments}$ can be rewritten in vector form as
$$
\begin{gather*}
\vec{\theta} = (\theta_1, \cdots, \theta_r), \qquad \vec{m} = (m_1, \cdots, m_r) \\

\vec{\mu}(\vec{\theta}) = \bigg( \mu_1(\theta_1, \cdots, \theta_r), \cdots, \mu_r(\theta_1, \cdots, \theta_r) \bigg) \\

\eqref{eq:sample-moments} \Leftrightarrow \vec{\mu}(\vec{\theta}) = \vec{m}
\end{gather*}
$$

So what does solving the system of equations actually mean? Suppose there exists an inverse of $\vec{\mu}$, denoted as $\vec{\mu}^{-1}$ continuous. Recall that by the law of large numbers, $m_k \xrightarrow{P} \mu_k$ as $n \rightarrow \infty$.

$$ \hat\theta = \vec{\mu}^{-1}(\vec{m}) = \vec{\mu}^{-1}((m_1, \cdots, m_r)) \xrightarrow{P} \vec{\mu}^{-1}(\mu_1, \cdots, \mu_r) = (\theta_1, \cdots, \theta_r) $$

As an example, suppose $Y_i \overset{i.i.d.}{\sim} Gamma(\alpha, \beta)$. We know that $E[Y_i] = \alpha\beta$ and $Var(Y_i) = \alpha\beta_2$. To find estimators for $\alpha$ and $\beta$, the population moments are

$$
\begin{gather*}
\mu_1(\alpha, \beta) = \alpha\beta \\

\mu_2(\alpha, \beta) = Var(Y_i) + E[Y_i]^2 = \alpha\beta_2 + \alpha^2\beta_2
\end{gather*}
$$

Set the system of equations as

$$
\begin{cases}
m_1 = \alpha\beta \\

m_2 = \alpha^2\beta^2 + \alpha\beta_2
\end{cases}
$$

Substitute the first equation into the second one and we get

$$ m_2 = m_1^2 + m_1\beta \Rightarrow \beta = \frac{m_2 - m_1^2}{m_1} $$

Putting this back into the first equation

$$ \alpha = \frac{m_1}{\beta} = \frac{m_1^2}{m_2 - m_1^2} $$

And we can get our estimates:
$$
\begin{aligned}
\hat\alpha &= \frac{m_1^2}{m_2 - m_1^2} \xrightarrow{P} \frac{\mu_1^2}{\mu_2 - \mu_1^2} = \alpha \\

\hat\beta &= \frac{m_2 - m_1^2}{m_1} \xrightarrow{P} \frac{\mu_2 - \mu_1^2}{\mu_1} = \beta
\end{aligned}
$$

This way of constructing estimators is fairly simple, but is not used much nowadays because we have better methods, as introduced in the next chapter.

Feb 02 | Optimal Unbiased Estimator | 5 min read |

Jan 30 | Sufficiency | 5 min read |

Jan 29 | Maximum Likelihood Estimator | 7 min read |

Jan 27 | Consistency | 6 min read |

Jan 25 | Bias and Variance | 11 min read |