We now proceed to an obvious extension of the one-sample procedures: paired samples. This chapter is going to be much shorter as the procedures are very similar to the previously discussed one-sample tests.

Paired observations are pretty common in applications as each unit acts as its own control. For instance, before treatment vs. after, left side vs. right side, identical twin A vs. identical twin B, etc. Another way this comes about is via `matching`

(a common experimental design technique): the researcher matches each unit with another unit that shares relevant (or maybe irrelevant) characteristics - age, education, height, etc. and the two units differ on the treatment. Then, use (some of) our one-sample methods on the **differences** between the paired observation values. By analogy, consider the paired t-test, which looks very much like a one-sample t-test.

## Wilcoxon Signed Rank Test

Suppose $12$ sets of identical twins were given a psychological test to measure the amount of aggressiveness in each person's personality. We're interested in whether the firstborn tends to be more aggressive than the other. A higher score is indicative of higher aggressiveness. We hypothesize

\[\begin{aligned} &H_0: \text{the firstborn does not tend to be more aggressive (no difference is also okay)} \\ \text{vs. } &H_1: \text{the firstborn twin tends to be more aggressive} \end{aligned}\]

Here's the data we need for the signed rank test on the differences $d_i$:

Firstborn ($x_i)$ | Secondborn ($y_i$) | Difference ($d_i$) | Rank | Signed rank |
---|---|---|---|---|

86 | 88 | 2 | 4 | 4 |

71 | 77 | 6 | 8 | 8 |

77 | 76 | -1 | 2.5 | -2.5 |

68 | 64 | -4 | 5 | -5 |

91 | 96 | 5 | 6.5 | 6.5 |

72 | 72 | 0 | (1) | $-$ |

77 | 65 | -12 | 11 | -11 |

91 | 90 | -1 | 2.5 | -2.5 |

70 | 65 | -5 | 6.5 | -6.5 |

71 | 80 | 9 | 10 | 10 |

88 | 81 | -7 | 9 | -9 |

87 | 72 | -15 | 12 | -12 |

We use mid-ranks for the ties, then use Sprent and Smeeton "device" for handling the deviation of $0$. Our test statistic:

\[S_+ = 4 + 8 + 6.5 + 10 = 28.5\]

Now we can get the normal approximation and score representation of the test statistic:

\[\text{Normal approximation} = \frac{S_+ - \frac{1}{4}n(n+1)}{\sqrt{\frac{1}{24}n(n+1)(2n+1)}} = \frac{28.5 - \frac{1}{4} \times 12 \times 13}{\sqrt{\frac{1}{24} \times 12 \times 13 \times 25}} = -0.824\]

\[\text{Score representation} = \frac{S_+ - \frac{1}{2} \sum\limits_{i=1}^n {|S_i|}}{\frac{1}{2}\sqrt{\sum\limits_{i=1}^n {S_i^2}}} = \frac{28.5 - \frac{1}{2} \times 77}{\frac{1}{2}\sqrt{648}} = -0.786\]

For the score representation, we see a slight difference when we compare to: $\sum\limits_{i=1}^{12} i = 78$, and $\sum\limits_{i=1}^{12} i^2 = 650$. The difference is caused by the $-$ rank in the case where $d_i = 0$. In either case, there's really no evidence that the firstborn twin is more aggressive.

### Notes

- Within a pair, observations may not be independent (obviously, as almost by construction sometimes they won't be), but the
**pairs**themselves should be. - The typical $H_0$ here would be that the
**median of the differences**is $0$. If the differences are assumed to have a**symmetric distribution**(about $0$), then the Wilcoxon approach is suitable. Note then that the individual unit measurements do**not**need to be assumed symmetric. You need to think about whether this is realistic for your given situation. - The alternative therefore refers to a
**shift in centrality**of the differences. You need to think about whether this is an interesting, relevant, important question.

## McNemar's Test

The test is a less obvious use of, or a modification to the sign test. Here's example 5.4 in the book. We have records on all attempts of two rock climbs, successful or not. For the $108$ who tried both:

First climb success | First climb failure | |
---|---|---|

Second climb success | 73 | 14* |

Second climb failure | 9* | 12 |

Is there evidence that one climb is harder? The only ones that had information **for this question** are the ones that succeeded in one but failed in another. The people who succeeded / failed in both are essentially "ties".

We can frame this as a Binomial setup. Think of success $\rightarrow$ failure as a "+" for the first climb, and failure $\rightarrow$ success as a "-" for the first climb. This puts us under a sign test situation. Under $H_0:$ climbs are equally difficult, and the probability of a "+" is the same as the probability of a "-" $\Rightarrow Bin(23, 0.5)$ with a "+". The p-value is 0.405, so there seems to be no difference in the difficulty of the climbs.

This version of the sign test is called `McNemar's test`

. In general, we have pairs of data $(x_i, y_i)$ $i=1 \cdots m$ where $X_i$ and $Y_i$ take values $0$ and $1$ only. There are $4$ patterns of outcome: $(0, 0), (0, 1), (1, 0), (1, 1)$. The paired data are summarized in a $2 \times 2$ `contingency table`

showing the classifications of $Y_i$ and $X_i$:

$Y_i = 0$ | $Y_i = 1$ | |
---|---|---|

$X_i = 0$ | $a\text{ (\# of (0, 0) pairs)}$ | $b\text{ (\# of (0, 1) pairs)}$ |

$X_i = 1$ | $c\text{ (\# of (1, 0) pairs)}$ | $d\text{ (\# of (1, 1) pairs)}$ |

### Assumptions

- Pairs are mutually independent.
- Two categories for each outcome ($X_i = 0, 1,/,Y_i = 0, 1$)
- The difference $P(X_i = 0, Y_i = 1) - P(X_i = 1, Y_i = 0)$ is negative for all $i$, 0 for all $i$, or positive for all $i$.

The null hypothesis can be formulated in various equivalent ways:

\[\begin{aligned} &H_0: P(X_i = 0, Y_i = 1) = P(X_i = 1, Y_i = 0) && \text{for all } i \\ \text{vs. } &H_1: P(X_i = 0, Y_i = 1) \neq P(X_i = 1, Y_i = 0) && \text{for all } i \end{aligned}\]

or

\[\begin{aligned} &H_0: P(X_i = 0) = P(Y_i = 0) && \text{for all } i \\ \text{vs. } &H_1: P(X_i = 0) \neq P(Y_i = 0) && \text{for all } i \end{aligned}\]

or

\[\begin{aligned} &H_0: P(X_i = 1) = P(Y_i = 1) && \text{for all } i \\ \text{vs. } &H_1: P(X_i = 1) \neq P(Y_i = 1) && \text{for all } i \end{aligned}\]

The first one, in terms of our example, is saying $P(\text{success} \rightarrow \text{failure}) = P(\text{failure} \rightarrow \text{success})$. The second one is saying $P(\text{failure on first climb}) = P(\text{failure on second climb})$.

Our test statistic - if $b$ and $c$ are reasonably small, take $T_2 = b$. Under $H_0$, $b \sim Bin(b+c, 0.5)$, which is our usual sign test. Or for larger $b, c$:

\[T_1 = \frac{(b-c)^2}{b+c}\]

Note that it does **not** depend on $a$ and $d$. Under $H_0$, $T_1$ is approximately $\chi^2_{(1)}$. To derive $T_1$, let's call $b + c = n$. If $n$ is large enough, we can use a normal approximation to the Binomial:

\[\frac{T_2 - \frac{1}{2}n}{\sqrt{\frac{1}{2} \cdot \frac{1}{2}n}} = \frac{b - \frac{1}{2}(b+c)}{\frac{1}{4}(b+c)} = \frac{b-c}{\sqrt{b+c}} = \mathbf{Z} \approx N(0, 1)\]

\[T_1 = \mathbf{Z}^2 \Rightarrow T_1 \approx \chi^2_{(1)}\]

And that's it! Paired-sample tests really aren't that different from one-sample tests. Now we're ready for some real problems: two independent samples for example.