These are the notes I took for a Master's course **Nonparametric Statistics**.** **The recommended textbook is Sprent, P. and Smeeton, N.C. (2007) *Applied Nonparametric Statistical Methods, Fourth Edition. *

## Outline of the Course

We will start with an overview of some fundamentals of nonparametric statistics.

Then we will consider in turn methods for a single sample (location inference and others), for two samples (paired and independent), and for multiple samples. This will be followed by discussion of correlation, concordance, as well as association and other related methods for categorical data. Finally, we will look at a variety of more "modern" nonparametric methods, such as the bootstrap, kernel density estimation and regression.

## Some Basic Concepts

We want to move away from "standard" or "typical" approaches to statistical inference, where we assume that our data are drawn from some distributional family, e.g. the standard setup in which

\[X_1, X_2, ..., X_n \sim N(\mu, \sigma^2)\]

here $N(\mu, \sigma^2)$ is a Normal distributional family. Similarly we could have $Pois(\lambda)$ for a Poisson distribution. In these cases, we're making assumptions about the underlying distribution. These assumptions may (or may not) be realistic or valid. In any case, they are restrictive.

`Nonparametric (sometimes called "distribution-free") statistical methods`

aim to **relax** these assumptions about distributional forms. They will be more **general** and more **robust** (methods will be good in a wider range of applications), but we sacrifice **power** (not always) if the data truly come from a particular family, such as Normal, for which optimal tests (such as `z-test`

or `t-test`

) exist.

The term `nonparametric method`

is also used in a variety of ways, which we want to examine:

- Classical approaches, e.g. based on
`ranks`

- Computational approaches, e.g.
`bootstrap`

- Modern regression (and other) approaches, e.g.
`smoothing`

### Big Question

If we don't assume a distributional family, how can we proceed to do inference? What sorts of inferential questions can we ask and answer?

We do still need to make *some* assumptions (of course), but they can be weaker than what we're used to.

### Examples

- Instead of normality, which is a strong assumption, we might assume that the true data distribution is merely symmetric.
- For comparing two samples, rather than assuming that both come from normally-distributed populations with possibly different means, we might assume that their distributions are the same (without specifying what it is) but with a
**shift**in location:

```
library(tidyverse)
N <- 1e+6
components <- sample(1:3,size = N,replace = TRUE,
prob = c(0.3,0.5,0.2))
mus <- c(0,10,3)
sds <- sqrt(c(0.2,1,3))
samples <- rnorm(
n = N,
mean = mus[components],
sd = sds[components]
)
tibble(
weird_1 = samples,
weird_2 = samples + 2
) %>%
gather(key = "dist", value = "value") %>%
ggpubr::ggdensity(x = "value", color = "dist", fill = "dist",
palette = "npg", alpha = 0.25)
```

First, we'll need to discuss some of the basic tools in nonparametric statistics.