Statistical Test
The imbalanced severity of false decision leads to the concept of hypothesis testing. In a statistical decision where mistakes are imbalanced, we first guard against the mistake of more severity, say to control the probability to be smaller than 0.05, and then do our best to minimize the probability of the other mistake.
Elements of a statistical test
Def: Statistical hypothesis test
(HT) is a statistical decision process which has the following elements:
- Null hypothesis
. - Alternative hypothesis
where and are disjoint, and . We also have that the severity of falsely accepting the severity of falsely accepting .- This is also how we decide which hypothesis is
.
- This is also how we decide which hypothesis is
- Test statistic
. - Rejection region
. - Decision rule: if
, we reject or accept ; otherwise we fail to reject .
When
, it’s often preferable to say “fail to reject ” than “accept ”, since there’s a built-in bias towards guarding . In other words, we’re being very conservative about .
Type I and type II errors
Suppose we have i.i.d. sample
The parameter spaces are
Intuitively, when
Def: In a hypotheses test, the type I error
is when
Type II error
is when
The largest possible type I error probability is called the level
of the test:
Strictly speaking, we should use the supremum instead of the maximum here.
We shall choose the size of power
of the test. Namely,
Remarks
In general, both
and are functions of , but defined on different domains and , respectively.If
consists of a single element , we say that we have asimple null hypothesis
. Otherwise, we say we have acomposite null hypothesis
. Similarly we have simple/composite alternative hypotheses. In particular when , the level is directly given as
- Power, the probability of correctly accepting
, is often interpreted as the ability to detect . typically increases when the sample size increases, or when moves away from the boundary between and . - More generally, a hypothesis test can be formulated even without introducing a test statistic. One essentially only needs to specify the rejection region on the space of the sample - there is a region
such that is rejected whenever . The rejection using test statistic is a special case of this, since one can identify with the preimage of under the mapping .
Example with a normal distribution
Following the example where
When
Type I error
When
where
Note that since the CDF
So
where
Type II error
When
$$ \begin{aligned} \beta(\theta) &= P_\theta(T \notin RR) = P_\theta(\bar{Y}n < c) \ &= P\theta \left( \sqrt{n}(\bar{Y}_n - \theta) < \sqrt{n}(c-\theta) \right) \ &= \Phi \left(\sqrt{n}(c-\theta) \right) \end{aligned} $$
When we plug in the previously worked out
The power is given by
Note how the power increases as
Two-sided hypothesis
Suppose
where
When
Type I error: when
$$ \begin{align*} \alpha(\theta) &= P_\theta (T \in RR) = P_\theta (\bar{Y}n < -c) + P\theta(\bar{Y}n > c ) \ &= P\theta ( \sqrt{n} \bar{Y}n < -c\sqrt{n})) + P\theta( \sqrt{n} \bar{Y}_n > c\sqrt{n} ) \ &= \Phi(-c\sqrt{n}) + 1 - \Phi(c\sqrt{n}) \ &= 2 - 2\Phi(c\sqrt{n}) \end{align*} $$
where
so
where
Type II error: when
$$ \begin{align*} \beta(\theta) &= P_\theta (T \notin RR) = P_\theta (-c < \bar{Y}n < c) \ &= P\theta \left( -\sqrt{n}(c - \theta) < \sqrt{n}(\bar{Y}_n - \theta) < \sqrt{n}(c - \theta) \right) \ &= \Phi(\sqrt{n}(c - \theta)) - \Phi(-\sqrt{n}(c - \theta)) \ &= 2\Phi(\sqrt{n}(c - \theta)) - 1 \end{align*} $$
Plugging in
Power:
Large-sample Z-tests
Now that we know the concepts in a hypothesis test, the important question is how does one construct a hypothesis test. Here we introduce the Z-test
, which is closely related to the Z-score.
Suppose we want to test
where the hypothesized value of interest,
- Estimate
with an estimator ; - If
is far from , we reject .
Now the question becomes how far is “far”? We need to take the randomness of
Proposal: suppose
Note that unlike the Z-score for constructing the CI, our
Our proposed rejection rule is we reject
so we may choose
To sum up, the proposed rejection rule is
$$ |Z| = \frac{|\hat\theta - \theta0|}{\hat\sigma} > z{\frac{\alpha}{2}} \Leftrightarrow \theta0 > \hat\theta + \hat\sigma z{\frac{\alpha}{2}} \text{ or } \theta0 < \hat\theta - \hat\sigma z{\frac{\alpha}{2}} $$
The type I error is given by
What we just described is known as the two-sided Z-test
. There’s also one-sided versions, as shown in the table below.
Rejection rule | ||
---|---|---|
In the one-sided Z-tests, one may either have a simple
Bernoulli distribution example
Suppose
and we have
Hence the estimated standard error of
Then the test statistic is
So
Hypothesis test and confidence intervals
You’ve probably noticed the resemblance between the Z-test and the Z-score confidence interval. In fact, there is a connection between the two. Long story short, whenever you can come up with the CIs, you’d be able to perform the hypothesis test.
Recall we’ve just discussed that the rejection rule for a two-sided Z-test is
$$ |Z| = \frac{|\hat\theta - \theta0|}{\hat\sigma} > z{\frac{\alpha}{2}} \Leftrightarrow \theta0 > \hat\theta + \hat\sigma z{\frac{\alpha}{2}} \text{ or } \theta0 < \hat\theta - \hat\sigma z{\frac{\alpha}{2}} $$
On the other hand, a two-sided
Note how the quantities in the Z-test is exactly the boundaries of the Z-score CI. So the rejection rule can be alternatively expressed as we reject
From this, we may propose a general principle. To convert a CI to a hypothesis test
, suppose
The level is correct because under
using the definition of coverage probability of a CI.
Uniform distribution example
For
where
So
It also works the other way around. To convert a hypothesis test to a CI
(taking the Z-test as an example), the test statistic
and the rejection region is
$$ I = \{\theta0 \in \mathbb{R}: Z(\theta_0) \notin RR\} = \left\{ \theta_0 \in \mathbb{R}: \left| \frac{\hat\theta - \theta_0}{\hat\sigma} \right| \leq z{\frac{\alpha}{2}} \right\} = \left[ \hat\theta - \hat\sigma_{\frac{\alpha}{2}}, \hat\theta + \hat\sigma_{\frac{\alpha}{2}} \right] $$
which is the Z-score CI. The correspondence between CI and HT is one-to-one, although we use
Some small-sample tests
Recall that we have discussed small sample CIs for means and variances. Based on the correspondence between CI and HT, we can easily develop hypothesis tests based on those CIs.
One-sample mean
The assumption is
The test statistic, motivated from the pivotal quantity for CI, is
where
The rejection region can be found by inverting the CIs:
If we look at
Def: an unknown parameter which is not of interest in the hypothesis test is called a nuisance parameter
, e.g. the
The test statistic non-central t-distribution
when
Two-sample means
Our assumption is
The test statistic again comes from the pivotal quantity in CI:
where
is an unbiased estimator of
When
One-sample variance
Our assumption is
The test statistic we use is
which follows
where
There’s also a two-sample variance test involving the F-distribution.
Apr 21 | Linear Models | 9 min read |
Apr 18 | Likelihood Ratio Test | 7 min read |
Apr 14 | Optimal Tests | 7 min read |
Apr 09 | p-values | 6 min read |
Apr 01 | Statistical Decision | 4 min read |