Statistical Test

Apr 1, 2020
13 min read
Mar 11, 2022 15:59 UTC
Here we introduce the elements of a statistical test, namely null and alternative hypotheses, test statistic, rejection region, and type I and type II errors. We then proceed to large-sample Z-tests and some small-sample tests derived from the small sample CIs.

The imbalanced severity of false decision leads to the concept of hypothesis testing. In a statistical decision where mistakes are imbalanced, we first guard against the mistake of more severity, say to control the probability to be smaller than 0.05, and then do our best to minimize the probability of the other mistake.

Elements of a statistical test

Def: Statistical hypothesis test (HT) is a statistical decision process which has the following elements:

  • Null hypothesis H0:θΘ0.
  • Alternative hypothesis Ha:θΘa where Θ0 and Θa are disjoint, and Θ0Θa=Θ. We also have that the severity of falsely accepting H0< the severity of falsely accepting Ha.
    • This is also how we decide which hypothesis is H0.
  • Test statistic T.
  • Rejection region RR.
  • Decision rule: if TRR, we reject H0 or accept Ha; otherwise we fail to reject H0.

When TRR, it’s often preferable to say “fail to reject H0” than “accept H0”, since there’s a built-in bias towards guarding H0. In other words, we’re being very conservative about H0.

Type I and type II errors

Suppose we have i.i.d. sample Y1,,YnN(θ,1). Our hypotheses are

H0:θ0vs.Ha:θ>0

The parameter spaces are Θ0=(,0] and Θa=(0,). We propose the sample mean as a test statistic T=Y¯n, which is the estimator of θ.

Intuitively, when θ0, Y¯n is unlikely to be large. Because of this, we propose RR=[c,) where c is a to be determined threshold, and we reject H0 if and only if Tc.

Def: In a hypotheses test, the type I error is when H0 is true but we choose Ha (falsely reject H0), whose probability is denoted as

α(θ)=Pθ(TRR),θΘ0

Type II error is when Ha is true but we choose H0 (falsely accept H0), and the probability is denoted as

β(θ)=Pθ(TRR),θθa

The largest possible type I error probability is called the level of the test:

α=max(α(θ):θΘ0)

Strictly speaking, we should use the supremum instead of the maximum here.

We shall choose the size of RR so that α matches a given level, say 0.05. The probability of accepting Ha when Ha is true is called the power of the test. Namely,

pw(θ)=Pθ(TRR)=1β(θ),θΘa

Remarks

  1. In general, both α(θ) and β(θ) are functions of θ, but defined on different domains Θ0 and Θa, respectively.

  2. If Θ0 consists of a single element θ0, we say that we have a simple null hypothesis. Otherwise, we say we have a composite null hypothesis. Similarly we have simple/composite alternative hypotheses. In particular when Θ0={θ0}, the level is directly given as

α=α(θ0)=Pθ0(TRR)

  1. Power, the probability of correctly accepting Ha, is often interpreted as the ability to detect Ha. pw(θ) typically increases when the sample size increases, or when θΘa moves away from the boundary between Θ0 and Θa.
  2. More generally, a hypothesis test can be formulated even without introducing a test statistic. One essentially only needs to specify the rejection region on the space of the sample - there is a region RRn such that H0 is rejected whenever (Y1,,Yn)R. The rejection using test statistic T=t(Y1,,Yn) RR is a special case of this, since one can identify R with the preimage of RR under the mapping t:RnR.

Example with a normal distribution

Following the example where Yii.i.d.N(θ,1), H0:θ0, Ha:θ>0, T=Y¯n and RR=[c,). Note here we fixed RR, but in reality there’s some flexibility where we decide the size of RR based on the desired level of α.

When θ is the true parameter, we know that

Y¯nN(θ,1n)Y¯nθ1/nN(0,1)

Type I error

When θΘ0=(,0], the type I error is given by

α(θ)=Pθ(TRR)=Pθ(Y¯nc)=P(Y¯nθ1/ncθ1/n)=1Φ(n(cθ))

where Φ is the CDF of N(0,1). The level is

α=maxθ0(1Φ(n(cθ)))

Note that since the CDF Φ is monotonically increasing, the function 1Φ(n(cθ)) is also monotonically increasing in θ, so the maximum is achieved at the boundary θ=0:

α=1Φ(cn)

So cn=zα, and we need to choose

c=zαn

where zα is the (1α)-quantile of N(0,1).

Type II error

When θΘa=(0,),

$$ \begin{aligned} \beta(\theta) &= P_\theta(T \notin RR) = P_\theta(\bar{Y}n < c) \ &= P\theta \left( \sqrt{n}(\bar{Y}_n - \theta) < \sqrt{n}(c-\theta) \right) \ &= \Phi \left(\sqrt{n}(c-\theta) \right) \end{aligned} $$

When we plug in the previously worked out c=zα/n,

β(θ)=Φ(zαnθ),θ>0

The power is given by

pw(θ)=1Φ(zαnθ),θ>0

Note how the power increases as θ increases (moves away from the boundary 0), and increases as n increases.

Two-sided hypothesis

Suppose Yi are i.i.d. N(θ,1) as above. This time we want to test H0:θ=0 vs. Ha:θ0. The test statistic T is still Y¯n, but RR now takes the form

(,c][c,)

where c is a threshold chosen to make the test at level α. The calculation of the type I, II error probabilities and the power is given as follows.

When θ is the true parameter, we have

Y¯nN(θ,1n)Y¯nθ1/nN(0,1)

Type I error: when θΘ0=0,

$$ \begin{align*} \alpha(\theta) &= P_\theta (T \in RR) = P_\theta (\bar{Y}n < -c) + P\theta(\bar{Y}n > c ) \ &= P\theta ( \sqrt{n} \bar{Y}n < -c\sqrt{n})) + P\theta( \sqrt{n} \bar{Y}_n > c\sqrt{n} ) \ &= \Phi(-c\sqrt{n}) + 1 - \Phi(c\sqrt{n}) \ &= 2 - 2\Phi(c\sqrt{n}) \end{align*} $$

where Φ is the CDF of a standard normal distribution. The level is also

α=22Φ(cn)

so

Φ(cn)=1α2cn=zα2c=zα2n

where zα2 is the (1α2)-quantile of N(0,1).

Type II error: when θΘa=(,0)(0,),

$$ \begin{align*} \beta(\theta) &= P_\theta (T \notin RR) = P_\theta (-c < \bar{Y}n < c) \ &= P\theta \left( -\sqrt{n}(c - \theta) < \sqrt{n}(\bar{Y}_n - \theta) < \sqrt{n}(c - \theta) \right) \ &= \Phi(\sqrt{n}(c - \theta)) - \Phi(-\sqrt{n}(c - \theta)) \ &= 2\Phi(\sqrt{n}(c - \theta)) - 1 \end{align*} $$

Plugging in c=zα2n,

β(θ)=2Φ(zα2nθ)1,θ0

Power:

pw(θ)=1β(θ)=22Φ(zα2nθ)=2Φ(nθzα2),θ0

Large-sample Z-tests

Now that we know the concepts in a hypothesis test, the important question is how does one construct a hypothesis test. Here we introduce the Z-test, which is closely related to the Z-score.

Suppose we want to test

H0:θ=θ0vs.Ha:θθ0

where the hypothesized value of interest, θ0, is regarded as known. Although estimation cannot resolve the decision directly, the following idea is natural:

  1. Estimate θ with an estimator θ^;
  2. If θ^ is far from θ0, we reject H0.

Now the question becomes how far is “far”? We need to take the randomness of θ^ into account. If it has a large variance, then we need to think how far it can deviate from θ0.

Proposal: suppose σ^ is the estimated standard error of θ^. Let’s consider the signed distance between θ0 and θ^ relative to σ^:

Z=θ^θ0σ^

Note that unlike the Z-score for constructing the CI, our Z is a statistic because all the quantities can be computed from the sample.

Our proposed rejection rule is we reject H0 if |Z|>k, namely when θ^ is k-times standard error away from θ0. To choose an appropriate k, recall that if H0:θ=θ0 is true, then often when the sample size is large,

ZN(0,1)

so we may choose k using the normal quantile zα/2.

To sum up, the proposed rejection rule is

$$ |Z| = \frac{|\hat\theta - \theta0|}{\hat\sigma} > z{\frac{\alpha}{2}} \Leftrightarrow \theta0 > \hat\theta + \hat\sigma z{\frac{\alpha}{2}} \text{ or } \theta0 < \hat\theta - \hat\sigma z{\frac{\alpha}{2}} $$

The type I error is given by

Pθ0(|Z|>zα2)=α

What we just described is known as the two-sided Z-test. There’s also one-sided versions, as shown in the table below.

H0HaRejection rule
θ=θ0θθ0|Z|>zα/2
θ=θ0 or θθ0θ>θ0Z>zα
θ=θ0 or θθ0θ<θ0Z<zα

In the one-sided Z-tests, one may either have a simple H0:θ=θ0 or a composite H0:θθ0. Following the calculations in the example, we can show that the rejection rule provides the same correct level for both the simple and the composite cases.

Bernoulli distribution example

Suppose Yii.i.d.Bern(θ). Our hypotheses are

H0:θ0.9vs.Ha:θ>0.9

and we have n=100, θ^=Y¯n and α=0.05. Suppose the observed θ^=0.93. Recall that

Var(θ^)=θ(1θ)n

Hence the estimated standard error of θ^ is

σ^=θ^(1θ^)n=0.0255

Then the test statistic is

Z=0.930.90.0255=1.18<z0.05=1.64

So ZRR and we fail to reject H0.

Hypothesis test and confidence intervals

You’ve probably noticed the resemblance between the Z-test and the Z-score confidence interval. In fact, there is a connection between the two. Long story short, whenever you can come up with the CIs, you’d be able to perform the hypothesis test.

Recall we’ve just discussed that the rejection rule for a two-sided Z-test is

$$ |Z| = \frac{|\hat\theta - \theta0|}{\hat\sigma} > z{\frac{\alpha}{2}} \Leftrightarrow \theta0 > \hat\theta + \hat\sigma z{\frac{\alpha}{2}} \text{ or } \theta0 < \hat\theta - \hat\sigma z{\frac{\alpha}{2}} $$

On the other hand, a two-sided (1α)-CI based on the Z-score is given by

I=[θ^σ^zα2,θ^+σ^zα2]=θ^±σ^zα2

Note how the quantities in the Z-test is exactly the boundaries of the Z-score CI. So the rejection rule can be alternatively expressed as we reject H0 if θ0I. This means if a CI fails to cover θ0, then θ^=θ0 is unlikely.

From this, we may propose a general principle. To convert a CI to a hypothesis test, suppose I is a (1α)-CI of θ. To test H0:θ=θ0 vs. Ha:θθ0 at level α, we reject H0 if θ0I.

The level is correct because under H0, we have

Pθ0(θ0I)=α

using the definition of coverage probability of a CI.

Uniform distribution example

For Yii.i.d.Unif(0,θ), we’ve derived a one-sided (1α)-CI of the form

I=[θ^(1α)1n,)

where θ^=max(Y1,,Yn). By the discussion above, we may propose a hypothesis test with the rejection rule as

θ0Iθ0<θ^(1α)1nθ^>θ0(1α)1n

So H0:θ=θ0 or θθ0, and Ha:θ>θ0.

It also works the other way around. To convert a hypothesis test to a CI (taking the Z-test as an example), the test statistic

Z(θ0)=θ^θ0σ^

and the rejection region is {z:|z|>zα2}. Hence,

$$ I = \{\theta0 \in \mathbb{R}: Z(\theta_0) \notin RR\} = \left\{ \theta_0 \in \mathbb{R}: \left| \frac{\hat\theta - \theta_0}{\hat\sigma} \right| \leq z{\frac{\alpha}{2}} \right\} = \left[ \hat\theta - \hat\sigma_{\frac{\alpha}{2}}, \hat\theta + \hat\sigma_{\frac{\alpha}{2}} \right] $$

which is the Z-score CI. The correspondence between CI and HT is one-to-one, although we use CIHT more often.

Some small-sample tests

Recall that we have discussed small sample CIs for means and variances. Based on the correspondence between CI and HT, we can easily develop hypothesis tests based on those CIs.

One-sample mean

The assumption is Yii.i.d.N(μ,σ2). The hypotheses are

H0:μ=μ0vs.Ha:{μ>μ0;μμ0;μ<μ0

The test statistic, motivated from the pivotal quantity for CI, is

T=Y¯nμ0σ^/nt(n1) under H0

where σ^2 is the unbiased estimator of σ2:

σ^2=1n1i=1n(YiY¯n)2

The rejection region can be found by inverting the CIs:

RR={{t:t>tα(n1)};{t:|t|>tα2(n1)};{t:t<tα(n1)}.

If we look at H0, is this a simple null or a composite null hypothesis? One might say this is a simple hypothesis since only μ0 is involved, but in fact this is composite because the unknown σ2 is an important part of our model. H0 is essentially

(μ,σ2)Θ0={(μ0,u):u>0}

Def: an unknown parameter which is not of interest in the hypothesis test is called a nuisance parameter, e.g. the σ2 above. It’s often desirable to consider test statistic T whose distribution does not depend on a nuisance parameter (very similar to the pivotal quantity idea).

The test statistic T above follows a non-central t-distribution when μ0 is not the true parameter. This is of interest when analyzing the Type II error/power of the tests, or the type I error under H0: μ<μ0 or μ>μ0.

Two-sample means

Our assumption is X1,,Xn1i.i.d.N(μ1,σ2) and Y1,,Yn2i.i.d.N(μ2,σ2). Xi’s are independent of Yi’s. The hypotheses are

H0:μ1μ2=δ0vs.Ha:{μ1μ2>δ0;μ1μ2δ0;μ1μ2<δ0.

The test statistic again comes from the pivotal quantity in CI:

T=X¯Y¯δ0σ^/1n1+1n2t(n1+n22) under H0

where

σ^2=i=1n1(XiX¯)2+i=1n2(YiY¯)2n1+n22

is an unbiased estimator of σ2. The n1+n22 is in the denominator because we lost two degrees of freedom when estimating the means. The rejection regions are

RR={{t:t>tα(ν)};{t:|t|>tα2(ν)};{t:t<tα(ν)}.where ν=n1+n22

When n1 and n2 are very large, the t-distribution becomes really close to the normal distribution, and the Z-score should be used.

One-sample variance

Our assumption is Y1,,Yni.i.d.N(μ,σ2). The hypotheses are

H0:σ2=σ02vs.Ha:{σ2>σ02;σ2σ02;σ2<σ02.

The test statistic we use is

T=i=1n(YiY¯n)2σ02

which follows χ2(n1) under H0. The rejection regions are

RR={{t:t>χ2α(n1)};{t:t>χ2α2(n1) or t<χ21α2(n1)};{t:t<χ21α(n1)}.

where χα2(ν) denotes the (1α)-quantile of χ2(ν).

There’s also a two-sample variance test involving the F-distribution.


Related Posts