In the previous chapter we mainly focused on discrete random variables whose set of possible values is either finite or countably infinite. In this chapter, we study random variables whose set of possible values is uncountable. We'll see later on that a lot of the cases we've discussed have analogs in the continuous case.

## 3.1 Definitions

The continuous random variable is a random variable with infinite possible outcomes (a subset of the real line). We say $X$ is a continuous random variable if there exists a nonnegative function $f$ defined for all $x \in (-\infty, \infty)$, having the property that for any set $B$ of real numbers,

$P\{x \in B\} = \int_B f(x)dx$

Here the function $f$ is called the probability density function. It resembles the probability mass function in the discrete case. The PDF has the following properties:

1. $\int_{-\infty}^\infty f(x)dx = P\{X \in (-\infty, \infty)\} = 1$
2. $P\{a \leq x \leq b\} = \int_a^b f(x)dx = P\{a \leq x \leq b\}$
3. $P\{X = a\} = \int_a^a f(x)dx = 0$

We can also define the cumulative distribution function for a continuous random variable.

\begin{aligned} F(a) &= P\{X \leq a\}, \qquad B \in (-\infty, a] \\ &= \int_{-\infty}^a f(x)dx \end{aligned}​

### Example 3.1.1

Suppose $X$ is a continuous random variable with probability density function

$f(x) = \begin{cases} \left( 4x - 2x^2 \right), & 0 < x < 2 \\ 0, & \text{otherwise} \end{cases}$

We'd like to know the value of $C$, and the probability $P{X > 1}$.

#### Solution

\begin{aligned} 1 &= \int_{-\infty}^\infty f(x)dx \\ &= \int_{-\infty}^0 f(x)dx + \int_0^2 f(x)dx + \int_2^\infty f(x)dx \\ &= C\int_0^2 (4x - 2x^2)dx \\ &= C\left(\int_0^2 4xdx - \int_0^2 2x^2dx \right) \\ &= C \left( 2x^2\bigg|_0^2 - \frac{2}{3}x^3 \bigg|_0^2 \right) \\ &= C\left( 8 - \frac{16}{3} \right) \\ &\Rightarrow C = \frac{3}{8} \end{aligned}

\begin{aligned} P\{X > 1\} &= \int_1^\infty f(x)dx \\ &= \int_1^2 \frac{3}{8}\left(4x - 2x^2\right)dx \\ &= \frac{3}{8}\left( 2x^2\bigg|_1^2 - \frac{2}{3}x^3 \bigg|_1^2 \right) \\ &= \frac{3}{8}\left( 8-2 - \left(\frac{16}{3} - \frac{2}{3}\right) \right) = \frac{1}{2} \end{aligned}

### Example 3.1.2

$X$ is the lifetime of an item, a continuous random variable  with a density function

$f(x) = \begin{cases} \lambda e^{-x/100}, & x \geq 0 \\ 0, & x < 0 \end{cases}$

What is the probability that the item functions between $50$ and $150$ days?

#### Solution

In mathematical terms, we want to calculate $P\{50 \leq X \leq 150\}$. We first need to find the value of $\lambda$.

\begin{aligned} 1 &= \int_{\infty}^\infty f(x)dx \\ &= \int_0^\infty \lambda e^{-\frac{x}{100}}dx \\ &= \lambda \int_0^\infty e^{-\frac{x}{100}}dx \\ &= -100\lambda \int_0^\infty e^{-\frac{x}{100}} d\frac{-x}{100} \\ &= -100\lambda e^{-\frac{x}{100}} \bigg|_0^\infty \\ &= -100\lambda(0 - 1) = 100\lambda\end{aligned}

Recall that $\int e^xdx = e^x$, and $d(ax) = a \cdot dx$ because the derivative is a linear function.

With $\lambda = \frac{1}{100}$, we can calculate

\begin{aligned} P\{50 \leq X \leq 150\} &= \int_{50}^{150} \frac{1}{100}e^{-\frac{x}{100}}dx \\ &= \frac{-100}{100}\int_{50}^{150}e^{-\frac{x}{100}} d\frac{-x}{100} \\ &= -\left( e^{-\frac{x}{100}}\bigg|_{50}^{150} \right) \\ &= e^{-\frac{1}{2}} - e^{-\frac{3}{2}} \approx 0.383\end{aligned}

## 3.2 Expectation and variance

In Chapter 2 we've defined the expectation for discrete random variables. If $X$ is a continuous random variable with probability density function $f(x)$, we have

$f(x)dx \approx P\{x \leq X \leq x + dx\}$

so it's easy to find the analog for the expectation of $X$ to be

$E[X] = \int_{-\infty}^\infty xf(x)dx$

Similarly, the expected value of a real-valued function of $X$ is

$E[g(x)] = \int_{-\infty}^\infty g(x)f(x)dx$

which can be used to derive the variance of $X$

$Var(X) = E[X^2] - E[X]^2$

### Example 3.2

$X$ is a continuous random variable with density function

$f(x) = \begin{cases} 2x, & 0 \leq x \leq 1 \\ 0, & \text{otherwise}\end{cases}$

Find $E[X]$ and $Var(X)$.

#### Solution

\begin{aligned} E[X] &= \int_{-\infty}^\infty xf(x)dx \\ &= \int_0^1 x \cdot 2x dx \\ &= \frac{2}{3}x^3 \bigg|_0^1 = \frac{2}{3} \\ E[X^2] &= \int_{-\infty}^\infty x^2 f(x)dx \\ &= \int_0^1 2x^3 dx \\ &= \frac{1}{2}x^4 \bigg|_0^1 = \frac{1}{2} \\ Var(X) &= \frac{1}{2} - \left(\frac{2}{3}\right)^2 = \frac{1}{18}\end{aligned}

## 3.3 The Uniform distribution

In the discrete case, if the outcomes of a random experiment are equally likely, then calculating the probability of the events will be very easy. This "equally likely" idea may also be applied to continuous random variables.

For a continuous random variable $X$, we say $X$ is uniformly distributed over the interval $(\alpha, \beta)$ if the density function of $X$ is

$f(X) = \begin{cases} \frac{1}{\beta - \alpha}, & \alpha \leq x \leq \beta \\ 0, & \text{otherwise}\end{cases}$

In other words, the density function of $X$ is a constant over a given interval and zero otherwise. This holds because

$1 = \int_{-\infty}^\infty f(x)dx = \int_\alpha^\beta Cdx = Cx \bigg|_\alpha^\beta = C(\beta - \alpha)$

The plot for this function is a horizontal line at $\frac{1}{\beta - \alpha}$ between the interval $(\alpha, \beta)$, and $0$ otherwise.

From $f(X)$, we can figure out the cumulative distribution function

$F(a) = \begin{cases} 0, & a < \alpha \\ \frac{a - \alpha}{\beta - \alpha}, & \alpha \leq a \leq \beta \\ 1, & a > \beta\end{cases}$

Plotting $F(a)$ against $a$ would result in a line connecting $(\alpha, 0)$ and $(\beta, 1)$, and staying at $1$ onwards.

### Example 3.3.1

Assume $X \sim Uniform(0, 10)$. We want to calculate

1. $P(X < 3)$,
2. $P(X > 6)$, and
3. $P(3 < X < 8)$.

#### Solution

$F(a) = \begin{cases} 0, & a < 0 \\ \frac{a}{10}, & 0 \leq a \leq 10 \\ 1, & a > 10\end{cases}$

\begin{aligned} P(X < 3) &= F(3) = \frac{3}{10} \\ P(X > 6) &= 1 - F(6) = 1 - \frac{6}{10} = \frac{2}{5} \\ P(3 < X < 8) &= F(8) - F(3) = \frac{1}{2}\end{aligned}

### Example 3.3.2

Buses arrive at a bus stop at a $15$ minutes interval starting at 7:00 a.m. Suppose that a passenger arrives at this stop at a time that is uniformly distributed between 7 and 7:30. Find the probability that he needs to wait less than 5 minutes until the next bus arrives.

#### Solution

We know that buses are going to arrive at 7:00, 7:15 and 7:30 in this time interval. There are then two scenarios that the passenger waits less than 5 minutes: arriving between 7:10 and 7:15, or between 7:25 and 7:30.

Define events $E_1 = \{10 \leq X \leq 15\}$ and $E_2 = \{25 \leq X \leq 30\}$.

\begin{aligned} E &= \{\text{wait less than 5 minutes}\} \\ &= E_1 \cup E_2 \\ P(E) &= F(15) - F(10) + F(30) - F(25)\end{aligned}

Since $X \sim Uniform(0, 30)$, we have

$F(a) = \begin{cases} 0, & a < 0 \\ \frac{a}{30}, & 0 \leq a \leq 30 \\ 1, & a > 30\end{cases}$

So $P(E) = \frac{1}{2} - \frac{1}{3} + 1 - \frac{5}{6} = \frac{1}{3}$.

### Expectation, variance and MGF

$X \sim Uniform(\alpha, \beta)$. Find $E[X]$ and $Var(X)$.

#### Solution

\begin{aligned} E[X] &= \int_{-\infty}^\infty xf(x)dx \\ &= \int_\alpha^\beta x \frac{1}{\beta - \alpha}dx \\ &= \frac{1}{\beta - \alpha} \frac{x^2}{2} \bigg|_\alpha^\beta \\ &= \frac{(\beta - \alpha)(\beta + \alpha)}{2(\beta - \alpha)} \\ &= \frac{\beta + \alpha}{2}\end{aligned}

To get $Var(X)$, we just need to calculate $E[X^2]$:

\begin{aligned} E[X^2] &= \int_\alpha^\beta \frac{x^2}{\beta - \alpha}dx \\ &= \frac{1}{\beta - \alpha} \frac{x^3}{3}\bigg|_\alpha^\beta \\ &= \frac{\beta^3 - \alpha^3}{3(\beta - \alpha)} \\ &= \frac{(\beta - \alpha)(\beta^2 + \alpha\beta + \alpha^2)}{3(\beta - \alpha)} \\ &= \frac{\beta^2 + \alpha\beta + \alpha^2}{3} \\ Var(X) &= \frac{\beta^2 + \alpha\beta + \alpha^2}{3} - \left( \frac{\beta + \alpha}{2} \right)^2 \\ &= \frac{\beta^2 - 2\beta\alpha + \alpha^2}{12} \\ &= \frac{(\beta - \alpha)^2}{12}\end{aligned}

So we've found the expected value for the uniform distribution is $\frac{\beta + \alpha}{2}$, and the variance is $\frac{(\beta - \alpha)^2}{12}$. We also want to find the moment generating function for the uniform distribution.

\begin{aligned} M(t) &= E\left[e^{tX}\right] = \int_{-\infty}^\infty e^{tx}f(x)dx \\ &= \int_\alpha^\beta \frac{1}{\beta - \alpha} e^{tx}dx \\ &= \frac{1}{\beta - \alpha} \frac{1}{t} \int_\alpha^\beta e^{tx}dtx \\ &= \frac{1}{t(\beta - \alpha)} e^{tx}\bigg|_\alpha^\beta \\ &= \frac{e^{t\beta} - e^{t\alpha}}{t(\beta - \alpha)}\end{aligned}

## 3.4 The Normal probability distribution

Another very important type of continuous random variables is the Normal random variable. The probability distribution of a Normal random variable is called the normal distribution.

The normal distribution is the most important probability distribution in both theory and practice. In real world, many random phenomena obey, or at least approximate, a normal distribution. For example, the distribution of the velocity of the molecules in gas; the number of birds in flocks, etc.

Another reason for the popularity of the normal distribution is due to its nice mathematical properties. We first study the simplist case of the normal distribution, the standard normal distribution.

### Standard normal distribution

A continuous random variable $Z$ is said to follow a standard normal distribution if the density function of $Z$ is given by

$f(Z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{Z^2}{2}} \quad -\infty < Z < \infty$

Conventionally, we use $Z$ to denote a standard normal random variable, and $\Phi(z)$ to denote a standard normal distribution.

The density function of a normal random variable is a bell-shaped curve which is symmetric about its mean (in this case, $0$). The symmetry can be easily proven by $f(Z) = f(-Z)$. We can also find $f(0) = \frac{1}{\sqrt{2\pi}} \approx 0.4$.

Next we prove this density function satisfies the properties of PDFs. The first property we want to prove is

$\int_{-\infty}^\infty f(x)dx = \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz = 1$

which is equivalent to proving

$I = \int_{-\infty}^\infty e^{-\frac{x^2}{2}} dx = \sqrt{2\pi}$

It's easier to prove $I^2 = 2\pi$, as shown below:

\begin{aligned} I^2 &= \int_{-\infty}^\infty e^{-\frac{x^2}{2}}dx \int_{-\infty}^\infty e^{-\frac{y^2}{2}}dy \\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-\frac{x^2 + y^2}{2}}dxdy \tag{3.4.1}\end{aligned}

Intuitively, we can think of this as transforming a circle from Cartesian coordinates to polar coordinates, $dxdy = Jdrd\theta$.

In the Cartesian coordinate system the increase $dxdy$ can be thought as the area of a small rectangle. In the polar coordinate system, the increase in area can be approximated by a rectangle with one side being $dr$ and the other being the length of the arc when the angle increases by $d\theta$ and the radius is $r$. Thus, $dxdy = r drd\theta$. Now we have

$\begin{cases} x^2 + y^2 = r^2 \\ \tan \theta = \frac{y}{x}\end{cases} \Rightarrow \begin{cases} x = r\cos \theta \\ y = r\sin \theta\end{cases}$

So we can rewrite Eq. (3.4.1) as

\begin{aligned} I^2 &= \int_0^\infty \int_0^{2\pi} e^{-\frac{r^2}{2}}rdrd\theta \\ &= -\int_0^\infty \int_0^{2\pi} e^{-\frac{r^2}{2}}d\left(-\frac{r^2}{2}\right) d\theta \\ &= -\int_0^{2\pi} d\theta \left( e^{-\frac{r^2}{2}} \right)\bigg|_0^\infty \\ &= -\int_0^{2\pi} d\theta (0 - 1) \\ &= \theta\bigg|_0^{2\pi} = 2\pi\end{aligned}

Formally speaking, the $J$ in $dxdy = Jdrd\theta$ is the Jacobian determinant, and can be found with

$J = \det \frac{\partial(x, y)}{\partial(r, \theta)} = \begin{vmatrix} \frac{\partial{x}}{\partial{r}} & \frac{\partial{x}}{\partial{\theta}} \\ \frac{\partial{y}}{\partial{r}} & \frac{\partial{y}}{\partial{\theta}}\end{vmatrix}$

Knowing that $x = r \cos \theta$ and $y = r \sin\theta$, we can find the determinant of the Jacobian matrix

\begin{aligned} J &= \begin{vmatrix} \cos\theta & r(-\sin\theta) \\ \sin\theta & r\cos\theta \end{vmatrix} \\ &= r\cos^2\theta + r\sin^2\theta = r\end{aligned}

### Expectation and variance

We first find the expected value of a standard normal random variable.

\begin{aligned} E[X] &= \int_{-\infty}^\infty xf(x)dx = \int_{-\infty}^\infty x \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}dx \\ &= -\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} d\left(-\frac{x^2}{2}\right) \\ &= -\frac{1}{\sqrt{2\pi}}\left( e^{-\frac{x^2}{2}} \right)\bigg|_{-\infty}^\infty = 0\end{aligned}

We can also get $E[X] = 0$ due to symmetry. To find the variance of $X$, we need to find $E[X^2]$:

\begin{aligned} E[X^2] &= \int_{-\infty}^\infty x^2f(x)dx = \int_{-\infty}^\infty x^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}dx \\ &= -\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}} x e^{-\frac{x^2}{2}}d\left(-\frac{x^2}{2}\right) \\ &= -\frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty xd\left( e^{-\frac{x^2}{2}} \right)\end{aligned}

Recall integration by parts: $\int udv = \int duv - \int v du$. In our case $u = x$ and $v = e^{-\frac{x^2}{2}}$, so

\begin{aligned} E[X^2] &= -\frac{1}{\sqrt{2\pi}} \left( \int_{-\infty}^\infty d\left( xe^{-\frac{x^2}{2}} \right) - \int_{-\infty}^\infty e^{-\frac{x^2}{2}}dx \right) \\ &= -\frac{1}{\sqrt{2\pi}} \left( xe^{-\frac{x^2}{2}}\bigg|_{-\infty}^\infty - \sqrt{2\pi} \right)\end{aligned}

When $x\rightarrow \infty$, $x$ goes to $\infty$ linearly, but $e^{-\frac{x^2}{2}}$ goes to $0$ exponentially, so the product goes to $0$. Similarly the term goes to $0$ when $x \rightarrow -\infty$. So we have

$E[X^2] = -\frac{1}{\sqrt{2\pi}} (-\sqrt{2\pi}) = 1 \Rightarrow Var(X) = 1- 0 = 1$

### Cumulative probability function

The cumulative distribution function of a standard normal random variable is also very useful.

\begin{aligned} \Phi(a) = F_Z(a) &= \int_{-\infty}^a f(z)dz \\ &= \int_{-\infty}^a \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz\end{aligned}

Unfortunately, the integral has no analytical solution. People developed the standard normal table that can be used to find values for a given $a$. The table is referred as the $\Phi$ table or the $Z$ table. In R, we can use the pnorm() function.

For example, to find $F_Z(0.32)$ we first look for row $0.3$ and then find the column for $0.02$. Since the standard normal distribution is symmetric around $0$, $\Phi(-a)$ is equivalent to $1 - \Phi(a)$. There are also tables designed to describe $\Phi(a)$ through the right tail area.

### General normal distribution

So far we've been discussing the case $X \sim N(0, 1)$, the standard normal distribution. Now we extend our findings to a general case: $X \sim N(\mu, \sigma^2)$.

$f(X) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

The reason we focused on the standard normal distribution is because it's easy to analyze. In addition, the properties of the general normal random variable can be derived from that of the standard normal random variable.

If we consider a linear function of the standard normal random variable $Z$,

$X = \mu + \sigma Z,$

we can easily find

$E[X] = E[\mu + \sigma Z] = \mu + \sigma E[Z] = \mu$

$Var(X) = Var(\mu + \sigma Z) + \sigma^2 Var(Z) = \sigma^2$

In that sense,

$X = \mu + \sigma Z \sim N(\mu, \sigma^2)$

We can use this linear function to express $Z$ with $X$ and find the cumulative distribution function and probability density function of $X$:

$Z = \frac{X - \mu}{\sigma} \sim N(0, 1)$

\begin{aligned} F_X(a) &= F_Z\left( \frac{a - \mu}{\sigma} \right) \\ f_X(X) &= \frac{\partial F_X(a)}{\partial a} = \frac{\partial F_Z\left( \frac{a - \mu}{\sigma} \right)}{\partial a} \end{aligned}

Here we apply the chain rule to calculate the partial derivatives. Let $y = \frac{a - \mu}{\sigma}$,

\begin{aligned} f_X(X) &= \frac{\partial F_Z(y)}{\partial y} \cdot \frac{\partial y}{\partial a} \\ &= \frac{1}{\sigma}f_Z \left( \frac{a - \mu}{\sigma} \right) \\ &= \frac{1}{\sigma} \cdot \frac{1}{\sqrt{2\pi}}e^{-\frac{\left( \frac{Z-\mu}{\sigma} \right)^2}{2}} \\ &= \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\end{aligned}

and the cumulative distribution function

$F_X(a) = F_Z\left( \frac{a - \mu}{\sigma} \right) = \Phi\left( \frac{a - \mu}{\sigma} \right)$

where the exact value can be found in the $\Phi$ table.

### Example 3.4.1

If $X \sim N(3, 9)$, find $P\{2 < X < 5\}$ and $P\{|X - 3| > 6\}$.

#### Solution

We have $\mu = 3, \sigma = 3$, and

$\frac{X - 3}{3} = Z \sim N(0, 1)$

\begin{aligned} P\{X \leq a\} &= F(a) = \Phi\left( \frac{a-3}{3} \right) \\ P\{2 < X < 5\} &= P\{X < 5\} - P\{X < 2\} \\ &= F_X(5) - F_X(2) \\ &= \Phi\left( \frac{2}{3} \right) - \Phi\left( -\frac{1}{3} \right) \\ &= \Phi\left( \frac{2}{3} \right) - 1 + \Phi\left( \frac{1}{3} \right) \approx 0.378 \\ P\{|X-3| >6\} &= P\{X-3 > 6\} + P\{ X - 3 < -6\} \\ &= P\{X > 9\} + P\{X < -3\} \\ &= 1 - F_X(9) + F_X(-3) \\ &= 1 - \Phi(2) + \Phi(-2) \\ &= 2(1 - \Phi(2)) \approx 0.046\end{aligned}

### Example 3.4.2

Find the moment generating function of $Z \sim N(0, 1)$.

#### Solution

\begin{aligned} M_Z(t) &= E[e^{tZ}] \\ &= \int_{-\infty}^\infty e^{tz}f(z)dz \\ &= \int_{-\infty}^\infty e^{tz} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}dz \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{tz - \frac{z^2}{2}} dz \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{\frac{t^2 - (z-t)^2}{2}} dz \\ &= \frac{1}{\sqrt{2\pi}}e^{\frac{t^2}{2}} \int_{-\infty}^\infty e^{-\frac{(z-t)^2}{2}} dz \\ &= \frac{1}{\sqrt{2\pi}}e^{\frac{t^2}{2}} \int_{-\infty}^\infty e^{-\frac{(z-t)^2}{2}} d(z-t) \\ &= \frac{1}{\sqrt{2\pi}}e^{\frac{t^2}{2}} \sqrt{2\pi} \\ &= e^{\frac{t^2}{2}} \end{aligned}

We can validate this by calculating $E[Z]$ and $Var(Z)$ with the MGF.

\begin{aligned} E[Z] &= M_Z^{(1)}(0) = \frac{de^{\frac{t^2}{2}}}{dt}\bigg|_{t=0} \\ &= \frac{de^{\frac{t^2}{2}}}{d\frac{t^2}{2}} \cdot \frac{d\frac{t^2}{2}}{dt} \bigg|_{t=0} \\ &= e^{\frac{t^2}{2}} t \bigg|_{t=0} = 0\end{aligned}

\begin{aligned} Var(Z) &= E[Z^2] = M_t^{(2)}(0) \\ &= \frac{dM'(t)}{dt} \bigg|_{t=0} \\ &= \frac{d\left[ e^{\frac{t^2}{2}} \right]}{dt} \\ &= e^{\frac{t^2}{2}} \frac{dt}{dt} + t \frac{de^{\frac{t^2}{2}}}{dt} \\ &= e^{\frac{t^2}{2}} + t \cdot te^{\frac{t^2}{2}} \\ &= (t^2 + 1)e^{\frac{t^2}{2}} \bigg|_{t=0} = 1\end{aligned}

### Theorem for the moment generating function

$X\ \sim N(\mu, \sigma^2)$, if $Y = a + bX$, we have

$M_Y(t) = e^{at}M_X(bt)$

Proof:

\begin{aligned} M_Y(t) &= M_{a+bX}(t) \\ &= E\left[ e^{t(a+bX)} \right] \\ &= E\left[ e^{at} e^{tbX} \right] \\ &= e^{at}E\left[ e^{tbX} \right] \\ &= e^{at}M_X(bt)\end{aligned}

If we apply this to Example 3.4.2, we have $X \sim N(\mu, \sigma^2) = \mu + \sigma Z$, so

$M_X(t) = e^{\mu t}M_Z(\sigma t) = e^{\mu t} e^{\frac{\sigma^2t^2}{2}} = e^{\frac{\sigma^2t^2}{2} + \mu t}$

### Theorem for a binomial distribution

Another very useful property of the normal distribution is that it can be used to approximate a binomial distribution when $n$ is very large.

Let $X$ be a Binomial random variable with parameters $n$ and $p$. Denote $\mu$ and $\sigma$ the mean and standard deviation of $X$. Then we have

$P\left\{\frac{X - \mu}{\sigma} \leq a\right\} \rightarrow \Phi(a) \quad \text{as } n \rightarrow \infty$

where $\Phi(\cdot)$ is the distribution  function of $Z \sim N(0, 1)$, i.e. $\Phi(a) = P\{Z \leq a\}$. In other words, when $n \rightarrow \infty$, $\frac{X - \mu}{\sigma} \rightarrow Z$. We also know that $E[X] = np = \mu$ and $Var(X) = np(1-p) = \sigma^2$, so we can rewrite the equation as

$P\left\{ \frac{X - np}{\sqrt{np(1-p)}} \leq a \right\} \rightarrow \Phi(a) \quad \text{as } n \rightarrow \infty$

The proof of this theorem is a special case of the central limit theorem, and we'll discuss it in a later chapter.

There's another approximation to the binomial distribution: the Poisson approximation, and its conditions are complementary to the conditions of the normal approximation.

The Poisson approximation requires $n$ to be large ($n \geq 20$) and $p$ to be small ($p \leq 0.05$) such that $np = \lambda$ is a fixed number.

The normal approximation is reasonable when $n$ is very large and $p$ and $1-p$ are not too small, i.e. $p \neq 0$ and $p \neq 1$.

### Example 3.4.3

Suppose that we flip a fair coin $40$ times, and let $X$ denote the number of heads observed. Calculate $P\{X = 20\}$.

#### Solution

We have $X \sim Binomial(40, 0.5)$. We can calculate the probability directly:

$P\{X = 20\} = \binom{40}{20} 0.5^{20}(1 - 0.5)^{20} \approx 0.1254$

We can also apply the normal approximation. we have $\mu = np = 20$ and $\sigma = \sqrt{Var(X)} = \sqrt{np(1-p)} = \sqrt{10}$.

\begin{aligned} &P\left\{ \frac{X-\mu}{\sigma} \leq a \right\} \rightarrow \Phi(a) \\ \Rightarrow &P\{X = 20\} = P\{19.5 \leq X \leq 20.5\} \qquad \text{because X is discrete} \\ &= P\left\{ \frac{19.5 - 20}{\sqrt{10}} \leq \frac{X - 20}{\sqrt{10}} \leq \frac{20.5 - 20}{\sqrt{10}} \right\} \\ &\approx P\left\{ -0.16 \leq \frac{X - 20}{\sqrt{10}} \leq 0.16 \right\} \\ &= \Phi(0.16) - \Phi(-0.16) \\ &= 2\Phi(0.16) - 1 \approx 0.1272\end{aligned}

### Example 3.4.4

The ideal class size of a college is 150 students. The college knows from past experience that, on average, only 30 of students will accept their offers. If the college gives out 450 offers, what is the probability that more than 150 students will actually attend?

#### Solution

For simplicity, we assume the decision of the students is independent from each other. Let $X$ be the number of students who accept the offer. $X \sim Binomial(450, 0.3)$.

\begin{aligned} P\{X > 150\} &= 1 - P\{X \leq 150\} \\ &= 1 - \sum_{i=0}^{150}P_X(i)\end{aligned}

We can see that it's hard to calculate this probability by hand. Again, we can apply the normal approximation. We have $\mu = np = 450 \times 0.3 = 135$ and $\sigma = \sqrt{np(1-p)} = \sqrt{94.5}$.

\begin{aligned} P\{X > 150\} &= P\{ X \geq 150.5 \} \\ &= \left\{ \frac{X - \mu}{\sigma} \geq \frac{150.5 - \mu}{\sigma} \right\} \\ &= 1 - \Phi\left( \frac{150.5 - \mu}{\sigma} \right) \\ &\approx 0.0559\end{aligned}

## 3.5 Exponential random variable

A continuous random variable $X$ is an exponential random variable with parameter $\lambda$ if the density function is given by

$f(X) = \begin{cases} \lambda e^{-\lambda X}, & X \geq 0 \\ 0, & X < 0\end{cases} \quad \text{where } \lambda > 0$

In practice, the exponential distribution often arises as the distribution of the amount of time until some specific event occurs. For example, the lifetime of a new mobile phone; the amount of time from now until an earthquake occurs; the survival time of a patient.

### PDF, expectation, variance and MGF

If $X \sim Exp(\lambda)$,

\begin{aligned} F_X(a) &= P\{X \leq a\} \\ &= \int_{-\infty}^a f(x)dx \\ &= \int_0^a f(x)dx \quad \text{if } a > 0 \\ &= \int_0^a \lambda e^{-\lambda x}dx \\ &= -\int_0^a e^{-\lambda x} d(-\lambda x) \\ &= -e^{-\lambda x} \bigg|_0^a \\ &= 1 - e^{-\lambda a}\end{aligned}

So we've found the probability density function of $X$ as

$F_X(a) = \begin{cases} 1 - e^{-\lambda a}, & a \geq 0 \\ 0, & a < 0\end{cases}$

To find the expectation of $X$, we first prove that

$E[X^n] = \frac{n}{\lambda} E[x^{n-1}]$

\begin{aligned} E[X^n] &= \int_0^\infty x^n\lambda e^{-\lambda x} dx \\ &= -\int_0^\infty x^n d(e^{-\lambda x}) \\ &= -\left( x^n e^{-\lambda x} \bigg|_0^\infty - \int_0^\infty e^{-\lambda x}d(x^n) \right) \\ &= -\left(\lim_{x \rightarrow \infty} \frac{x^n}{e^{\lambda x}} - 0 - \int_0^\infty nx^{n-1}e^{-\lambda x}dx \right) \\ &= \frac{n}{\lambda} \int_0^\infty x^{n-1}\lambda e^{-\lambda x} dx \\ &= \frac{n}{\lambda} E[X^{n-1}]\end{aligned}

Now we can easily find $E[X]$ by setting $n=1$:

$E[X] = \frac{1}{\lambda} E[X^{1-1}] = \frac{1}{\lambda}$

We also have

$E[X^2] = \frac{2}{\lambda}E[X] = \frac{2}{\lambda^2}$

so the variance of $X$ is

$Var(X) = \frac{2}{\lambda^2} - \frac{1}{\lambda^2} = \frac{1}{\lambda^2}$

Finally, the moment generating function of $X$ is

\begin{aligned} M_X(t) &= E\left[ e^{tx} \right] \\ &= \int_{-\infty}^\infty e^{tx}f(x)dx \\ &= \int_0^\infty e^{tx} \lambda e^{-\lambda x}dx \\ &= \lambda \int_0^\infty e^{(t-\lambda)x}dx \\ &= \frac{\lambda}{t - \lambda} \int_0^\infty e^{(t-\lambda)x}d(t - \lambda)x \\ &= \frac{\lambda}{t - \lambda} e^{(t - \lambda)x} \bigg|_0^\infty \qquad \text{, let } t < \lambda \\ &= \frac{\lambda}{\lambda - t}\end{aligned}

So we can conclude the MGF of this random variable as

$M_X(t) = \begin{cases} \frac{\lambda}{\lambda - t}, & t < \lambda \\ \text{not defined}, & t \geq \lambda\end{cases}$

### Memoryless

The exponential probability is usually defined as a probability associated with the time counted from now and onwards. So why don't we care about what happened in the past?

For example, when we consider the time from now to the next earthquake, why don't we care about how long has it been since the last earthquake? This leads to a nice property the exponential distribution has.

We say a non-negative random variable $X$ is memoryless if

$P\{X > s + t \mid X > t\} = P\{X > s\} \qquad \forall s, t \geq 0$

Going back to the earthquake example, the memoryless property means that the probability we will not observe an earthquake for $t+s$ days given that it has already been $t$ days without an earthquake is the same as the probability of not observing an earthquake in the first $s$ days.

\begin{aligned} P\{X > s + t \mid X > t\} &= \frac{P\{ X > s + t, x > t \}}{P\{ x > t \}} \\ &= \frac{P\{X > s + t\}}{P\{ x > t \}} \\ P\{X > s + t\} &= P\{X > t\}P\{X > s\} \quad \text{If X is memoryless.}\end{aligned}

For the exponential distribution,

\begin{aligned} P\{X > s + t\} &= 1 - F(s + t) \\ &= 1 - \left[ 1 - e^{-\lambda(s + t)} \right] \\ &= e^{-\lambda(s + t)} \\ &= e^{-\lambda t} e^{-\lambda s} \\ &= \left[ 1 - e^{-\lambda t} \right]\left[ 1 - e^{-\lambda s} \right] \\ &= (1 - F(t))(1 - F(s)) \\ &= P(X > t)P(X > s)\end{aligned}