In this chapter, we restrict ourselves to study a special type of random variables - the discrete random variables.

2.1 Discrete random variable

When an experiment is performed, we are often interested in some functions of the outcomes, rather than the outcomes themselves. For example, when we toss two six-sided dice, we may care about the sum of the two dice. These real-valued functions defined on the sample space are known as random variables.


A random variable is said to be discrete if it can take a finite or countable infinite number of distinct values. In practice, we usually use an uppercase letter to denote a random variable. For example,

\[X: S \rightarrow \mathbb{R}\]

and a lowercase letter, say $x$, to denote a possible value of this random variable. When we say $X = x$, we're referring to the set of outcomes on the sample space such that this equation holds, $\{X = x\}$. As an example,

\[\begin{aligned}    X&: \text{sum of two rolls} \\    \{X = 7\} &= \{ (1, 6), (2, 5), \cdots, (6, 1) \}\end{aligned}\]

Then we can assign a probability to each of the events. The probability mass function of a discrete random variable $X$ at a given value $x$ is denoted

\[P\{X = x\} = p_X(x)\]

The probability distribution of a discrete random variable $X$ is a collection of its probability mass functions over all its possible values. In other words, a collection of $p(x)$ for all $X$.

Example 2.1.1


Consider an experiment of tossing two fair coins. Denote $X$ as the number of heads. Find the probability distribution of $X$.


$S = \{ (H, H), (H, T), (T, H), (T, T) \}$. For $X: \text{the number of heads}$, if we view $X$ as a function of the sample space $S$,

\[\begin{cases}    X((H, H)) = 2 \\    X((H, T)) = 1 \\    X((T, H)) = 1 \\    X((T, T)) = 0\end{cases}\]

$X$ can take three possible values: $0, 1, 2$. The probability mass functions are

\[\begin{aligned}    p_X(0) &= P\{ X = 0 \} = P\{(T, T)\} = \frac{1}{4} \\    p_X(1) &= P\{ X = 1 \} = P\{(T, H), (H, T)\} = \frac{1}{2} \\    p_X(2) &= P\{ X = 2 \} = P\{(H, H)\} = \frac{1}{4} \\\end{aligned}\]

So we can write out the probability distribution of $X$

\[P(X) = \begin{cases}    0.25 & X = 0, 2 \\    0.5 & X = 1\end{cases}\]

We can also use a bar plot to show the probability distribution of $X$, with possible values of $X$ of the x-axis, and $P(X)$ on the y-axis.

Properties of the probability distribution

The probability mass function must satisfy some properties. Suppose $X$ is a discrete random variable with possible values $X_1, X_2, \cdots$, then

\[\begin{aligned}    &P(X_i) > 0 && \forall X_i, \\    &P(X) = 0 && \forall \text{ other } X, \\    &\sum_{i=1}^\infty P(X_i) = 1.\end{aligned}\]

For all the possible outcomes of a random variable $X$ where $X_i \neq X_j$, $\{X = X_i\} \cap \{ X = X_j\} = \emptyset$. For $X = X_1, \cdots, X_n$, $\bigcup_{i=1}^n\{ X = X_i \} = S$. We can think of each of $X_i$ as a simple event for the sample space $S$.

Cumulative distribution function

Besides the PMF which gives us the probability of one possible value of a random variable, we may want to calculate the probability for multiple values of a random variable. For example, in example 2.1.1,

\[\begin{aligned}    P\{ X < 2 \} &= P\{ X = 1 \text{ or } 0 \} \\    &= P(\{X = 1\} \cup \{X = 0\}) \\    &= \sum_{i=0}^1P(i) = \frac{1}{4}+ \frac{1}{2} = \frac{3}{4}\end{aligned}\]

The cumulative distribution function is defined as

\[F(a) = P\{X \leq a\} = \sum_{x \leq a} p(x)\]

under the discrete case. In the example above,

\[P\{X < 2\} = P\{X \leq 1\} = F(1) = p(1) + p(0)\]

In addition, we can write out the whole cumulative distribution function for all possible values of $a$

\[F_X(a) = \begin{cases}    0 && a < 0 \\    \frac{1}{4} && 0 \leq a < 1 \\    \frac{3}{4} && 1 \leq a < 2 \\    1 && a \geq 2\end{cases}\]

The figure for the CDF of a discrete random variable is a non-decreasing step function of $a$.

2.2 Expectation

After we've learnt the PMF and the CDF, we can move one step further to probably one of the most important concepts in probability theory - the expectation of a random variable.


If $X$ is a discrete random variable with probability mass function $P(X)$, its expectation is

\[E[X] = \sum_X{Xp(X)} = \sum_{X: p(X) > 0}{Xp(X)}\]

Intuitively, it is the long-run average value of repetitions of the same experiment it represents. In our case, the expected value of a discrete random variable is the probability-weighted average of all possible values.

Recall that random variables are some real-valued functions that map the outcomes in a sample space to some real numbers. So a real-valued function of a random variable is also a real-valued function on the same sample space, and hence is also a random variable.

Define $Y = g(X)$ with $g(\cdot)$ being a real-valued function.  This means that everything we can perform on X can be done in a similar fashion on $Y$. For example, calculating the expected value of $Y$

\[E[Y] = E[g(X)] = \sum_{g(X)}g(x)p(x) = \sum_{X:P(X) > 0}g(x)p(x)\]

The only difference between $E[X]$ and $E[Y]$ is we replace $X_i$ by $g(X_i)$ in the summation. Note how $p(x)$ wasn't changed!

Example 2.2.1


Let $X$ be a discrete random variable with three possible values $-1, 0, 1$, and probability mass function

\[P(-1) = 0.2, P(0) = 0.5, P(1) = 0.3.\]

Find $E(X)$ and $E(X^2)$.


\[\begin{aligned}    E(X) &= \sum_X{XP(X)} \\    &= -1P(-1) + 0P(0) + 1P(1) \\     &= -0.2 + 0.3 = 0.1\end{aligned}\]

Let $Y = g(X) = X^2$, we have

\[\begin{aligned}    E(X^2) &= \sum_{X}X^2P(X) \\    &= (-1)^2P(-1) + 0^2P(0) + 1^2P(1) \\    &= 0.2 + 0.3 = 0.5\end{aligned}\]

Since the expected value is a linear function, given $X$ is a random variable, $g(\cdot)$ and $h(\cdot)$ are two real-valued functions and $\alpha$ and $\beta$ are two constants,

\[E[\alpha g(X) + \beta h(X)] = \alpha E[g(X)] + \beta E[h(X)]\]


The variance of a random variable $X$ can also be written as the expectation of a function of $X$

\[\begin{aligned}    Var(X) &= \sigma^2 = E\left[ (X - E[X] )^2 \right] \\    &= E\left[ X^2 - 2XE[X] + E[X]^2 \right] \\    &= E[X^2] - 2E[X]^2 + E[X]^2 \\    &= E[X^2] - (E[X])^2\end{aligned}\]

2.3 Bernoulli and Binomial random variables


Suppose a random experiment has a binary outcome, say $1$ or $0$. Let $X$ be a random variable to indicate the result of this random experiment, then the probability mass function of $X$ can be written as

\[P(i) = \begin{cases} p & i=1 \\ 1-p & i=0 \end{cases}\]

where $0 \leq p \leq 1$. Then we say $X \sim Bernoulli(p)$, meaning $X$ is a Bernoulli random variable with parameter $p$, or $X$ is drawn from a Bernoulli distribution with parameter $p$.

A Binomial experiment is a random experiment that contains $n$ independent and identical Bernoulli experiments. For example, tossing a coin $n$ times.

Denote $X$ as the number of successes observed in the $n$ trails. Then $X \sim Binomial(n, p)$ where $p$ is the probability of success in each trail. $X$ can take any integer value from $k = 0, 1, \cdots, n$. The PMF is

\[P(k) = \binom{n}{k} p^k(1-p)^{n-k} \quad k = 0, 1, \cdots, n\]


We want to check/know three things: $\sum_i P(i) = 1$, $E[X]$, and $Var(X)$.

Bernoulli distribution

Finding the expectation and variance for the Bernoulli distribution is straightforward:

\[\begin{aligned}    \sum_i P(i) &= P(0) + P(1) = 1 - p + p = 1 \\    E(X) &= \sum_{i=0}^1iP(i) = 0(1-p) + p = p \\    Var(X) &= E(X^2) - E(X)^2 \\ &= \sum_{i=0}^1 i^2P(i) - p^2 = p - p^2 = p(1-p)\end{aligned}\]

Binomial distribution

\[\sum_{i=0}^np(i) = \sum_{i=0}^n\binom{n}{i}p^i(1-p)^{n-i} \tag{binom.1}\]

We know the Binomial equation

\[(a + b)^2 = a^2 + 2ab + b^2 = a^2b^0 + 2a^1b^1 + a^0b^2\]

\[(a + b)^n = \sum_{k=0}^n C_k a^{n-k}b^k = \sum_{k=0}^n \binom{n}{k} a^{n-k}b^k\]

Now going back to Eq.(binom.1), we have

\[\sum_{i=0}^n\binom{n}{i}p^i(1-p)^{n-i} = (p + (1-p))^n = 1\]

The expected value of a Binomial random variable is

\[E[X] = np \tag{binom expectation}\]

Lemma: Suppose we have $X \sim Binomial(n, k)$ and $Y \sim Binomial(n-1, k)$. We can show that

\[E[X^k] = npE\left[(Y+1)^{k-1}\right] \tag{binom.2}\]

if this equation holds, we can get $E[X]$ by setting $k=1$.


\[\begin{aligned}    E[X^k] &= \sum_X x^kp(x) \\    &= \sum_{i=0}^n {i^k \binom{n}{i}p^i(1-p)^{n-i}} \\    &= \sum_{i=1}^n {i^k \binom{n}{i}p^i(1-p)^{n-i}} \quad \cdots\text{first term is 0} \\    &= \sum_{i=1}^n {i^{k-1} n\binom{n-1}{i-1}p^i(1-p)^{n-i}} \quad \cdots i\binom{n}{i} = n\binom{n-1}{i-1} \\    &= np \sum_{i=1}^n {i^{k-1} n\binom{n-1}{i-1}p^{i-1}(1-p)^{n-i}}\end{aligned}\]

Let $i = j + 1$, we have

\[\begin{aligned}    E[X^k] &= np \sum_{j=0}^{n-1}{(j+1)^{k-1}\binom{n-1}{j}p^j(1-p)^{n-j-1}} \\    &= np E\left[(y+1)^{k-1}\right]\end{aligned}\]

For $Var(X)$, we just need to figure out $E[X^2]$. Set $k=2$ for lemma (binom.2), we have

\[E[X^2] = npE([Y+1]) = np(E[Y] + 1) = np((n-1)p + 1) = n^2p^2 + (1-p)np\]

because $Y \sim Binomial(n-1, p)$. So the variance of $X$ is

\[Var(X) = n^2p^2 + (1-p)np - n^2p^2 = np(1-p) \tag{binom variance}\]

2.4 Geometric random variable

Suppose we have a Binomial experiment which consists of some independent and identical Bernoulli experiments with probability of success $p$. We can define $Y$ as a random variable to describe the number of trials until the first success. For example, if we're tossing a coin for $n$ times, $Y$ will be the number of tosses when a head first appears.

\[\begin{aligned}    &\{Y=1\} && p \\    &\{Y=2\} && (1-p)p \\    &\vdots && \vdots \\    &\{Y=n\} && (1-p)^{n-1}p\end{aligned}\]


$X$ is said to be a geometric probability distribution if and only if

\[P(X) = q^{x-1}p \qquad q = 1-p, x = 1, 2, 3, \cdots \]

where $0 \leq p \leq 1$.


\[\sum_{i=1}^\infty p(i) = p\sum_{i=1}^\infty q^{i-1}\]

We know the geometric sequence

\[\begin{aligned}    \sum_{i=1}^\infty q^{i-1} &= 1 + q + q^2 + q^3 + \cdots \qquad \text{if } 0 \leq q < 1 \\    &= \frac{1}{1-q} = \frac{1}{p}\end{aligned}\]

For the expectation of $X$, we have

\[\begin{aligned}    E[X] &= \sum_{i=1}^\infty iq^{i-1}p \\    &= \sum_{i=1}^\infty (i+1-1)q^{i-1}p \\    &= \sum_{i=1}^\infty (i-1)q^{i-1}p + \sum_{i=1}^\infty q^{i-1}p \\    &= \sum_{i=1}^\infty{(i-1)q^{i-1}p} + 1 \\    &= \sum_{j=0}^\infty{jq^j p} + 1, \qquad j = i-1\\    &= q \sum_{j=1}^\infty{jq^{j-1}p} + 1 \\    &= qE(X) + 1 \\    &\Rightarrow E(X) = \frac{1}{1-q} = \frac{1}{p} \tag{Geometric expectation} \end{aligned}\]

With the expectation of $X$ known, it's easy to get the variance of $X$ once we find $E[X^2]$

\[\begin{aligned} E(X^2) &= \sum_{i=1}^\infty i^2pq^{i-1} \\    &= \sum_{i=1}^\infty (i-1+1)^2pq^{i-1} \\    &= \sum_{i=1}^\infty \left((i-1)^2 + 2(i-1) + 1^2\right)pq^{i-1} \\    &= \sum_{i=1}^\infty (i-1)^2pq^{i-1} + 2\sum_{i=1}^\infty (i-1)pq^{i-1} + \sum_{i=1}^\infty pq^{i-1} \\    &= \sum_{j=0}^\infty j^2pq^j + 2\sum_{j=0}^\infty jpq^j + p\sum_{i=1}^\infty q^{i-1}, \qquad j = i-1 \\    &= q\sum_{j=1}^\infty j^2pq^{j-1} + 2q\sum_{j=1}^\infty jpq^{j-1} + 1 \\    &= qE[X^2] + 2qE[X] + 1 \\   &\Rightarrow (1-q)E[X^2] = \frac{2q}{p} + 1 \\    &E[X^2] = \frac{2-p}{p^2}\end{aligned}\]


\[\begin{aligned}    Var(X) &= E[X^2] - E[X]^2 \\    &= \frac{2-p}{p^2} - \frac{1}{p^2} = \frac{1-p}{p^2} \tag{Geometric variance} \end{aligned}\]

2.5 Poisson random variable

The Poisson random variable has a range of infinite size. It takes values in the set $\{0, 1, 2, \cdots\}$. We'll learn about this random variable through an example.

Suppose we're performing quality control for a mobile phone company. Each phone made has a small chance to be defect. The average number of defect phones produced per day is $\lambda$. Find the probabilty of producing $k$ defect phones on a usual day.

First, we may assume that the production of each phone is a random variable of two outcomes: $0$ or $1$, or $Y_i \sim Bernoulli(p)$ where $Y_i$ is the $i$-th phone with probability $p$ to be defect. In addition, we assume that there are $n$ phones produced, and the production of the phones are independent and share the same defect probability $p$. Define

\[X = \text{number of defect phones produced} \sim Binomial(n, p)\]

Then the probability of producing $k$ defect phones can be calculated using the probability mass function of $X$ at $k$:

\[P\{X = k\} = \binom{n}{k}p^k(1-p)^{n-k} \qquad k \in \{0, 1, \cdots, n\} \tag{poisson.1}\]

Our problem now is we don't know $n$ and $p$ explicitly. But what we do know is on average there are $\lambda$ defect phones produced per day, which gives

\[E[X] = np = \lambda \Rightarrow p = \frac{\lambda}{n}\]

Now we can replace the $p$ variable in Eq.(poisson.1) with $\frac{\lambda}{n}$, and write out the probability mass function as

\[\begin{aligned}    P(X = k) &= \binom{n}{k}\left(\frac{\lambda}{n}\right)^k\left(1 - \frac{\lambda}{n}\right)^{n-k} \\    &= \frac{n!}{k!(n-k)!} \cdot \frac{\lambda^k}{n^k} \cdot \frac{\left(1 - \frac{\lambda}{n}\right)^n}{\left(1 - \frac{\lambda}{n}\right)^k} \\    &= \underbrace{\frac{n(n-1)\cdots(n-k+1)}{n^k}}_{I(n)} \cdot \frac{\lambda^k}{k!} \cdot \underbrace{\frac{\left(1 - \frac{\lambda}{n}\right)^n}{\left(1 - \frac{\lambda}{n}\right)^k}}_{II(n)}\end{aligned}\]

Though we don't know $n$ exactly, it's reasonable to assume that it's a very large number, so we can let $n \rightarrow \infty$ to study it in an asymptotic way.

\[\begin{aligned}    \lim_{n \rightarrow \infty} I(n) &= \lim_{n \rightarrow \infty} \frac{n(n-1)\cdots (n-k+1)}{n^k} \\    &= \lim_{n \rightarrow \infty} 1 \times \frac{n-1}{n} \times \frac{n-2}{n} \times \cdots \times \frac{n-k+1}{n} \\    &= 1 \\    \lim_{n \rightarrow \infty} II(n) &= \lim_{n \rightarrow \infty}\frac{\left(1 - \frac{\lambda}{n}\right)^n}{\left(1 - \frac{\lambda}{n}\right)^k} \\    &= e^{-\lambda}\end{aligned}\]

where the limit for $II(n)$ is found using the following limits

\[\lim_{n \rightarrow \infty}\left(1 - \frac{\lambda}{n}\right)^n = e^{-\lambda}, \lim_{n \rightarrow \infty} \left(1 - \frac{\lambda}{n}\right)^k = 1\]

So as $n \rightarrow \infty$,

\[P\{X = k\} = \frac{\lambda^k}{k!}e^{-\lambda}\qquad k = 0, 1, 2, \cdots\]


For a discrete random variable whose probability mass function satisfies

\[P(X = k) = \frac{\lambda^k}{k!}e^{-\lambda}\qquad k = 0, 1, 2, \cdots,\]

we say $X$ is a Poisson random variable with parameter $\lambda$, or $X \sim Poisson(\lambda)$. The Poisson distribution provides a good model for the probability distribution of rare events that occur in space, time or any other dimension where $\lambda$ is the average value.


As always, we first check if the total probability is $1$.

\[\begin{aligned}    \sum_{i=0}^\infty P(i) &= e^{-\lambda}\frac{\lambda^i}{i!} \\    &= e^{-\lambda} \sum_{i=0}^\infty \frac{\lambda^i}{i!} \\    &= e^{-\lambda} \cdot e^\lambda = 1\end{aligned}\]

where we've used the Taylor series

\[\begin{aligned}    e^x &= \lim_{n \rightarrow \infty} \left( 1 + \frac{x}{n} \right)^n \\    &= \sum_{i=0}^\infty \frac{x^i}{i!} = 1 + x + \frac{x^2}{2!} + \cdots\end{aligned}\]

Next, we prove that the expectation of $X$ is $\lambda$.

\[\begin{aligned}    E(X) &= \sum_{i=0}^\infty ip(i) \\    &= \sum_{i=1}^\infty i \cdot e^{-\lambda} \cdot \frac{\lambda^i}{i!} \\    &= \sum_{i=1}^\infty e^{-\lambda} \frac{\lambda^{i-1}}{(i-1)!}\lambda \\    &= \lambda e^{-\lambda} \sum_{i=0}^\infty \frac{\lambda^{i-1}}{(i-1)!} \\    &= \lambda e^{-\lambda} \sum_{j=0}^\infty \frac{\lambda^j}{j!} \qquad j = i-1 \\    &= \lambda e^{-\lambda} e^\lambda = \lambda\end{aligned}\]

The variance is also $\lambda$. This doesn't happen often as the expectation and the variance have different units, but it's not a problem in the Poisson distribution as it is used to model counts, which is unitless.

\[\begin{aligned}   E[X^2] &= \sum_{i=0}^\infty i^2 e^{-\lambda} \cdot \frac{\lambda^i}{i!} \\    &= \lambda \sum_{i=1}^\infty i e^{-\lambda} \cdot \frac{\lambda^{i-1}}{(i-1)!} \\    &= \lambda \sum_{j=0}^\infty (j+1) e^{-\lambda} \cdot \frac{\lambda^j}{j!} \qquad j = i-1 \\    &= \lambda \left[ \sum_{j=0}^\infty je^{-\lambda}\frac{\lambda^j}{j!} + \sum_{j=0}^\infty e^{-\lambda} \frac{\lambda^j}{j!} \right] \\    &= \lambda(\lambda + 1) \\    Var(X) &= \lambda^2 + \lambda - \lambda^2 \\    &= \lambda\end{aligned}\]

Example 2.5.1


Suppose a custom service center on average receives $\lambda$ phone calls per day. Denote $X$ the number of phone calls received on a random day. Show that $X$ is a Poisson random variable with parameter $\lambda$.


For this example, what we observed is the number of successes over a period of time instead of a Binomial experiment. However, we can analyze this problem through a discrete way.

Suppose this customer service center works $n$ hours per day. For each hour, the probability that someone will call is $p$. Then we can guess $X \sim Binomial(n, p)$ such that $np = \lambda$. It will be a very crude guess as there can be more than one call for each hour.

To make a better approximation, we can reduce the time intervals, or in other words increase $n$. At some point, we'll have intervals that are so small that at most one person will call in it with probability $p$.

\[P\{X = k\} = \lim_{n \rightarrow \infty} \binom{n}{k}p^k(1-p)^{n-k} = e^{-\lambda}\frac{\lambda^k}{k!}\]

Thus we have $X \sim Poisson(\lambda)$.

2.6 Moments and moment generating functions

In the previous sections, we've shown the expected values and variances for random variables. In the calculations, we often have to calculate the expected values of some power functions of the random variable, such as $E[X^2]$ for the variances.

In general, it would be of interest to calculate $E[X^k]$ for some positive integer $k$. This expectation is called the $k$-th moment of $X$.

Moment generating function

The moment generating function can be used to systematically calculate the moments of a random variable.

For a random variable $X$, its moment generating function is defined as

\[M(t) = E\left[e^{tX}\right] = \sum_X{e^{tX}p(X)} \]

where $t$ is a parameter. Note that $M(t)$ is not random! We call $M(t)$ the moment generating function of $X$ because all the moments of $X$ can be obtained by successively differentiating $M(t)$ and then evaluating the result at $t=0$. For example, consider the first order derivative of $M(t)$.

\[\begin{aligned}    M'(t) &= \frac{d}{dt}E[e^{tX}] \\    &= E\left[ \frac{d}{dt}e^{tX} \right] \\    &= E\left[ \frac{d}{dtX} \frac{dtX}{dt} e^{tX} \right] \\    &= E[Xe^{tX}]\end{aligned}\]

and at $t=0$, we have

\[M'(0) = E[X]\]

Similarly, if we take the second order derivative of $M(t)$,

\[\begin{aligned}    M''(t) &= \frac{d}{dt}M'(t) = \frac{d}{dt}E[Xe^{tX}] \\    &= E[\frac{d}{dt}Xe^{tX}] = E[X^2e^{tX}]\end{aligned}\]

and at $t = 0$, we have

\[M''(0) = E[X^2e^0] = E[X^2]\]

In general, we can summarize the $k$-th derivative of $M(t)$ as

\[M^{(k)}(t) = E[X^ke^tX] \qquad k \geq 1\]

Then we evaluate this derivative at $t=0$, which yields

\[M^{(k)}(0) = E[X^ke^0] = E[X^k] \qquad k \geq 1\]

So for a given random variable, if we know its moment generating function, we can take advantage of this property to calculate all the moments of this random variable. The correspondence between a distribution and its MGF is one-to-one. MGF is an ID for different distributions.

Example 3.6.1


$X \sim Binomial(n, p)$, find $M(t)$ for $X$.


\[\begin{aligned}    M(t) &= E[e^{tX}] = \sum_X e^{tX}p(X) \\    &= \sum_{k=0}^n e^{tk}\binom{n}{k}p^k(1-p)^{n-k} \\    &= \sum_{k=0}^n \binom{n}{k}(pe^t)^k(1-p)^{n-k} \\    &= (pe^t + 1 - p)^n\end{aligned}\]

\[\begin{aligned}    M'(t) &= \frac{d}{dt}(pe^t + 1 - p)^n \\    &= \frac{d(pe^t + 1 - p)}{dt} \cdot \frac{d(pe^t + 1 - p)^n}{d(pe^t + 1 - p)} \\    &= pe^tn(pe^t + 1 - p)^{n-1} \\    E[X] &= M'(0) = pn(p + 1 - p)^{n-1} \\    &= np\end{aligned}\]

\[\begin{aligned}    M''(t) &= \frac{d}{dt}M'(t) = np\frac{d}{dt}[e^t(pe^t + 1 - p)^{n-1}] \\    &= np\left[ (pe^t + 1 - p)^{n-1}\frac{d}{dt}e^t + e^t\frac{d}{dt}(pe^t + 1 - p)^{n-1} \right] \\    &= np\left[ (pe^t + 1 - p)^{n-1}e^t + e^t(n-1)p(pe^t + 1 - p)^{n-2} \right] \\    M''(0) &= np[1 + (n-1)p] \\    &= np + n(n-1)p^2 = E[X^2]\end{aligned}\]

Example 3.6.2


Find $M(t)$ and $Var(X)$ for $X \sim Poisson(\lambda)$.


\[\begin{aligned}    M(t) &= E[e^{tX}] = \sum_{k=0}^\infty e^{tk} e^{-\lambda}\frac{\lambda^k}{k!} \\    &= e^{-\lambda}\sum_{k=0}^\infty \frac{(\lambda e^t)^k}{k!},\end{aligned}\]

using $\sum_{k=0}^\infty \frac{a^k}{k!} = e^a$, we have

\[M(t) = e^{-\lambda} \cdot e^{\lambda e^t} = e^{\lambda(e^t-1)}\]

\[\begin{aligned}    M'(t) &= \frac{d}{dt}e^{-\lambda} \cdot e^{\lambda e^t} \\    &= e^{-\lambda} \frac{d\lambda e^t}{dt} \cdot \frac{de^{\lambda e^t}}{d\lambda e^t} \\    &= e^{-\lambda}\lambda e^t e^{\lambda e^t}\end{aligned}\]

\[\begin{aligned}    M''(t) &= \frac{d}{dt} M'(t) = \lambda e^{-\lambda}\frac{d}{dt}e^t e^{\lambda e^t} \\    &=\lambda e^{-\lambda} \left[ e^t e^{\lambda e^t} + \lambda e^t e^t e^{\lambda e^t} \right] \\    &= \lambda e^{-\lambda}e^t e^{\lambda e^t} + \lambda^2 e^{-\lambda} e^{2t} e^{\lambda e^t}\end{aligned}\]

\[\begin{aligned}    E[X] &= M'(0) = e^{-\lambda} \lambda e^0 e^{\lambda e^0} = \lambda \\    E[X^2] &= M''(0) = \lambda e^{-\lambda} e^0 e^{\lambda e^0} + \lambda^2 e^{-\lambda} e^0 e^{\lambda e^0} = \lambda + \lambda^2 \\    Var(X) &= E[X^2] - E[X]^2 = \lambda\end{aligned}\]