The text book recommended for this course is Mathematical Statistics with Applications (7th edition) by Wackerly, Mendenhall and Sheaffer.

In this chapter, we introduce the concept of the probability of an event. Then we show how probabilities  can be computed in certain situations. As a preliminary, however, we need to discuss the concept of the sample space and the events of an experiment.

1.1 Experiment, sample space and events

An experiment is the process by which an observation is made. In particular, we are interested in a random experiment whose outcome is not predictable with certainty. The set of all possible outcomes of an experiment is known as the sample space of the experiment, and is often denoted $S$. An event (denoted $E$) is a set that contains some possible outcomes of the random experiment.

By definition, any event is a subset of the sample space. For a given random experiment, its sample space is unique. Let's see some examples.

  1. Example 1.1.1
    • Experiment: test of a certain disease on a patient.
    • Possible outcomes: positive of negative.
    • Sample space: $S = \{p, n\}$
    • Event: a patient tested negative: $E = \{n\}$
  2. Example 1.1.2
    • Experiment: rolling a six-sided die
    • Outcomes: $1, \cdots, 6$
    • Sample space: $S = \{1, 2, 3, 4, 5, 6\}$
    • Event: the outcome is greater than $3$: $E = \{4, 5, 6\}$
  3. Example 1.1.3
    • Experiment: tossing two coins
    • Outcomes: each coin is either head or tail
    • Sample space: $S = \{ (H,H), (H,T), (T,H), (T,T) \}$
    • Event: the first toss is a head: $E = \{ (H, H), (H, T) \}$
  4. Example 1.1.4
    • Experiment: life time of a computer
    • Outcomes: all non-negative real numbers.
    • Sample space: $S = \{X: 0 \leq X < \infty \}$
    • Event: the computer survives for more than $10$ hours: $E = \{ X: 10 < X < \infty \}$

For each experiment, we may define more than one event. Take Example 1.1.2, we can define events like

\[E_3 = \{3\} \qquad E_4 = \{4\} \qquad E_5 = \{5\} \qquad E_6 = \{6\}\]

If we observed the event $E = \{4, 5, 6\}$, it means we observed one of the three events $E_4$, $E_5$ or $E_6$. We say $E$ can be decomposed into $E_4$, $E_5$, and $E_6$. If an event can be further decomposed, it is called a compound event. Otherwise, it's called a simple event. Each simple event contains one and only one outcome.

Finally, events with no outcome is called the null event, and is denoted $\emptyset$. For example, an event of the outcome is greater than $7$ in example 1.1.2.

1.2 Set operations

Suppose we have a sample space $S$ and two events $E$ and $F$.


If all of the outcomes in $E$ are also in $F$, then we say that $E$ is contained in $F$, or $E$ is a subset of $F$. We write it as $E \subset F$. Subsets have several properties:

  1. Any event is a subset of the sample space: $E \subset S$.
  2. Any event is a subset of itself: $E \subset E$.
  3. If $E \subset F$ and $F \subset E$, then $E = F$.
  4. $\emptyset \subset E$, $\emptyset \subset S$.
  5. $E \subset F, F \subset G \Rightarrow E \subset G$.


We denote $E \cup F$ as the union of the two events. It is a new event which consists of all outcomes in $E$, and all the outcomes in $F$. In other words, $E \cup F = \{\text{either in } E \text{ or in } F \}$.

  1. $E \cup F \subset S$.
  2. $E \cup E = E$.
  3. $E \cup S = S, E \cup \emptyset = E$.


We denote $E \cap F$, or $EF$ for short, the intersection between $E$ and $F$. $E \cap F$ consists of all outcomes that are both in $E$ and in $F$.

  1. $E \cap S = E$.
  2. $E \cap E = E$.
  3. $E \cap \emptyset = \emptyset$.

We say $E$ and $F$ are disjoint or mutually exclusive if $E \cap F = \emptyset$. Any event is disjoint with the null event.


For any event $E$, we define a new event $E^C$, referred to as the complement of $E$. $E^C$ consists of all outcomes in the sample space $S$ that are not in $E$.

  1. $S^C = \emptyset$ and $\emptyset^C = S$.
  2. $E \cup E^C = S$.
  3. $E \cap E^C = \emptyset$. The two sets should always be disjoint.
  4. $\left( E^C \right)^C = E$.

Example 1.2


Consider rolling two six-sided dice. Let

\[\begin{aligned} E_1 &= \{\text{first roll is } 3 \} \\  E_2 &= \{\text{sum of two rolls is } 7 \} \\  E_3 &= \{\text{second roll} - \text{first roll} < 4 \} \end{aligned}\]

and we want to find (1) $E_1 \cup E_2$, (2) $E_1 \cap E_2$, and (3) $E_3^C$.


The sample space is

\[S = \{\underbrace{ (1, 1), \cdots, (6, 6)}_\text{36 outcomes} \}\]

and the events are

\[\begin{aligned} E_1 &= \{ (3, 1), (3, 2), \cdots, (3, 6) \} \\ E_2 &= \{ (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1) \}  \end{aligned}\]

We skip $E_3$ for now as it's more complicated. We can find that

  1. $E_1$ and $E_2$ only have one element in common, $(3, 4)$, so $E_1 \cup E_2 = \{ (3, 1), (3, 2), (3, 3), (3, 5), (3, 6), E_2 \}$.
  2. $E_1 \cap E_2 = \{ (3, 4) \}$.
  3. $E_3^C = \{ \text{second} - \text{first} \geq 4 \} = \{ S - F = 4 \} \cup \{ S - F = 5 \} = \{ (1, 5), (1, 6), (2, 6) \}$.

A graphical representation that is useful for illustrating logical relations among events is the Venn diagram.

Laws of set operations

The operations of sets can be applied to more than two events, and they follow certain rules similar to the rules of algebra. All the following rules can be verified by Venn diagrams.

  • Commutative laws: $E \cup F = F \cup E, \quad E \cap F = F \cap E$.
  • Associative laws: $(E \cup F) \cup G = E \cup (F \cup G), \quad (E \cap F) \cap G = E \cap (F \cap G)$.
  • Distributive laws: $(E \cup F) \cap G = (E \cap G) \cup (F \cap G), \quad (E \cap F) \cup G = (E \cup G) \cap (F \cup G)$.

In addition, there is a law that connects all three operations (union, intersection and complement) together, DeMorgan's law:

\[ \begin{aligned} (E \cup F)^C &= E^C \cap F^C \\ (E \cap F)^C &= E^C \cup F^C \end{aligned} \]

DeMorgan's law can be extended to more than two events. Let $\bigcup_{i=1}^n E_i$ denote the union of events $E_1$ to $E_n$, and $\bigcap_{i=1}^n E_i$ their intersection,

\[ \begin{aligned} \left( \bigcup_{i=1}^n{E_i} \right)^C &= \bigcap_{i=1}^n{E_i^C} \\ \left( \bigcap_{i=1}^n{E_i} \right)^C &= \bigcup_{i=1}^n{E_i^C} \end{aligned} \]

1.3 Probability of events

One way of defining the probability of an event is in terms of its relative frequency. Suppose we have a random experiment with sample space $S$, and we want to assign some number $P(E)$ to represent the probability of event $E$. We may repeat this random experiment many times. Let $n(E)$ be the number of times in the first $n$ repetitions of the experiment that the event $E$ occurs. The probability of the event is defined as

\[ P(E) = \lim_{n \rightarrow \infty}\frac{n(E)}{n} \]

There are a few drawbacks to this method:

  1. It requires $S$ to be countable.
  2. We need to assume that the limit exists, and is a positive number.
  3. Sometimes our random experiments are limited and we can't repeat it many times, or the experiment may not even be observable.

To overcome such drawbacks, modern mathematics used an axiom system to define the probability of an event.

Axioms of probability

For sample space $S$ and event $E$, we define three axioms

\[ \begin{aligned} \text{Axiom 1.} &\qquad 0 \leq P(E) \leq 1 \\  \text{Axiom 2.} &\qquad P(S) = 1 \\ \text{Axiom 3.} &\qquad \text{For any sequence of mutually exclusive events } E_1, E_2, \cdots \end{aligned} \]

\[P\left(\bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty{P(E_i)}\]

where $E_i \cap E_j = \emptyset$ for any $i$ and $j$ where $i \neq j$ (mutually exclusive). More formally, we can say $P$ to be $\sigma$-additive.

The definition through axioms is mathematically rigorous, flexible, and can be developed into an axiomatic system.

Example 1.3.1 (flexibility)

Suppose our experiment is tossing a coin. If we believe it is a fair coin, we have

\[ S = \{ H, T \}, \quad P(\{H\}) = P(\{T\}), \]

then using axioms $2$ and $3$ above, we can derive

\[ P\left(\bigcup_{i=1}^2 E_i \right) = \sum_{i=1}^2{P(E_i)} = P(S) = 2P(\{H\}) = 2P(\{T\}) \]

so $P(\{H\}) = P(\{T\}) = 0.5$.

If we believe the coin is biased and $P(\{H\}) = 2P(\{T\})$, then

\[ P(\{H\}) + P(\{T\}) = P(\{H\} \cup \{T\}) = P(S) = 1 \]

By combining the two equations, $3P(\{T\}) = 1 \Rightarrow P(\{T\}) = \frac{1}{3},\, P(\{H\}) = \frac{2}{3}$.

In this example, we didn't use any information of the observations or frequencies. We are assigning probabilities according to our belief so long as this assignment satisfies the three axioms. Based on the axioms, we can prove some simple propositions of probability.

Proposition 1

\[ P(E^C) = 1 - P(E) \]


\[ \begin{aligned} E^C \cap E &= \emptyset,\, E^C \cup E = S \\ P(E^C \cup E) &= P(E^C) + P(E) = 1 \\ \Rightarrow P(E^C) &= 1 - P(E) \end{aligned}  \]

Proposition 2

\[\text{If } E \subset F \text{, then } P(E) \leq P(F) \]

Proof: note that $E$ and $E^C \cap F$ are mutually exclusive.

\[\begin{aligned} E \cap (F \cap E^C) &= E \cap (E^C \cap F) = (E \cap E^C) \cap F = \emptyset \cap F = \emptyset \\ F &= E \cup (F \cap E^C) \\ P(F) &= P(E) + \underbrace{P(F \cap E^C)}_{\geq 0} \geq P(E)  \end{aligned}\]

Proposition 3

\[ P(E \cup F) = P(E) + P(F) - P(EF) \]

Proof: this proposition can be easily proved using a Venn diagram. Let $I$, $II$ and $III$ denote $E \cap (F^C)$, $E \cap F$ and $F \cap E^C$, respectively.

\[\begin{aligned} P(E) &= P(I) + P(II) \\ P(F) &= P(II) + P(III) \\ P(E \cup F) &= P(I) + P(II) + P(III) \\ &= P(E) + P(F) - P(II) \\ &= P(E) + P(F) - P(EF) \end{aligned}\]

Example 1.3.2


A student is applying for two jobs. Suppose she'll get an offer from company A with probability $0.3$, and an offer from company B with probability $0.4$, and with probability $0.3$ she gets both offers. What is the probability that she gets neither offer?


\[\begin{aligned} S &= \{(S, S), (S, F), (F, S), (F, F)\} \\ E &= \{\text{get offer from A}\} = \{(S, S), (S, F)\} \\ F &= \{\text{get offer from B}\} = \{(S, S), (F, S)\} \\ G &= \{\text{get two offers}\} = \{(S, S)\} \\ K &= \{\text{get no offers}\} = \{(F, F)\} \end{aligned}\]

We have $G = E \cap F$ and $K = E^C \cap F^C = (E \cup F)^C$. Knowing that $P(E) = 0.5, P(F) = 0.4$ and $P(G) = P(E \cap F) = 0.3$,

\[P(K) = P((E \cup F)^C) =  1 - P(E \cup F) = 1 - (P(E) + P(F) - P(EF)) = 0.4\]

1.4 The sample-point method

As somewhat shown in example 1.3.2, for an experiment with finite or countable number of outcomes, we can calculate the probability of an event through the so called sample-point method. The procedure is

  1. Define the experiment, sample space and simple events (outcomes).
  2. Assign reasonable probabilities to each simple event.
  3. Define the event of interest as a collection of simple events.
  4. Calculate the probability of the event by summing the probabilities of the simple events in the event.

The main idea of the sample-point method is based on Axiom $3$.

Example 1.4.1


A fair coin is tossed three times. Find the probability that exactly two of the three tosses are heads.


We'll follow the procedure in the sample-point method.

The experiment is "tossing the coin three times". The sample space is

\[S = \{\underbrace{(H, H, H), (T, H, H), \cdots, (T, T, T)}_8\}\]

where each of the $8$ outcomes can be considered as a simple event $E_1, \cdots, E_8$. Since we consider it as a fair coin,

\[P(E_1) = P(E_2) = \cdots = P(E_8) = \frac{1}{8}.\]

Our event of interest, $E$, is defined as

\[E = \{\text{2 heads and 1 tail}\} = \{(T, H, H), (H, T, H), (H, H, T)\} = \{(T, H, H)\} \cup \{(H, T, H)\} \cup \{(H, H, T)\}\]

The final step is calculating the probability of $E$

\[P(E) = P(F_1 \cup F_2 \cup F_3) = \sum_{i=1}^3P(F_i) = \frac{3}{8}\]

Note that in this case (and in many other experiments), all the outcomes in the sample space are equally likely to occur. For such experiments, we can simplify the sample-point method as

\[P(E) = \frac{\text{number of outcomes in }E}{\text{number of outcomes in }S}\]

Example 1.4.2


The Powerball is one of the largest lottery games in the US. The system works like this:

  1. $5$ numbered white balls are drawn out of $69$ balls without replacement.
  2. $1$ numbered red ball is drawn out of $26$ balls.

You win the Powerball if you chose exactly those $5+1$ balls, and the order of the white balls doesn't matter. What is the probability to win a Powerball?


It's reasonable to assume each outcome will be equally likely to occur. Each outcome is a set of $6$ numbers satisfying the above rules.

\[\begin{aligned} E &= \{ \text{You win the PB} \} \\ &= \{\text{You choose exactly the lucky number}\} \\ |E| &= 1 \\ P(E) &= \frac{|E|}{|S|} = \frac{1}{|S|} \\ |S| &= \text{\# ways to draw 1 red ball out of 26 } (26) \\ &\quad \times \text{\# ways to draw 5 white balls out of 69} \end{aligned}\]

If we draw the white balls one by one, we have $69 \times 68 \times 67 \times 66 \times 65$ ways (ordered outcomes). For each set of $5$ numbers, we have $5 \times 4 \times 3 \times 2$ ways of arranging them. So the number of ways to draw $5$ white balls of $69$ is

\[ \frac{69 \times 68 \times 67 \times 66 \times 65}{5 \times 4 \times 3 \times 2} \]

Formally, this is "choose $k$ from $n$", which can be written as $\binom{n}{k}$ and

\[\binom{n}{k} = \frac{n(n-1)\cdots(n-k+1)}{k(k-1)(k-2)\cdots} = \frac{n!}{(n-k)!k!}\]

Now we have

\[\begin{aligned}|S| &= 26 \times \binom{69}{5} = 26 \times 11,238,513 \approx 292M \\ P(E) &= \frac{1}{|S|} \approx 3.42 \times 10^{-9} \end{aligned}\]

1.5 Conditional probability and independence of events

The probability of an event will sometimes depend upon whether we know that other events have occurred. This is easier to explain with an example.

Conditional probability

Suppose we roll two fair six-sided dice. What is the probability that the sum of the two dice is $8$? Using the procedure above, we can easily get

\[P(E) = \frac{|E|}{|S|} = \frac{|\{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)\}|}{6 \times 6} = \frac{5}{36}\]

What if we know the first roll is $3$? That would be another event:

\[\begin{aligned} F &= \{\text{first roll is } 3\} \\    E' &= \{ \text{sum is } 8 \text{ given } F \} \end{aligned}\]

Given the first die is $3$, requiring the sum to be $8$ is equivalent to requiring the second roll to be $5$. So the probability is

\[P(E') = \frac{|E'|}{|S'|} = \frac{|\{(3, 5)\}|}{|\{(3, 1), \cdots, (3, 6)\}|} = \frac{1}{6} \neq \frac{5}{36}\]

Formally speaking, let $E$ be the event that sum is $8$, and $F$ be the event that the first roll is $3$. The conditional probability of $E$ given $F$ is denoted $P(E \mid F)$, and $P(E)$ is the unconditional probability of $E$. If $P(F) > 0$, then

\[P(E \mid F) = \frac{P(EF)}{P(F)} \tag{1.5.1}\]

To understand this, keep in mind that any event $E$ can be decomposed into $(EF) \cup (EF^C)$. From $Eq.(1.5.1)$, we can derive

\[P(EF) = P(E \mid F)P(F) \tag{1.5.2}\]

Now we can revisit the example above.

\[\begin{aligned}\text{Let } E = \{\text{sum} = 8\}&, \quad F = \{ \text{first}\} = 3 \\ P(E \mid F) &= \frac{P(EF)}{P(F)} \\    P(EF) &= \frac{|\{(3, 5)|\}}{|S|} = \frac{1}{36} \\    P(F) &= \frac{|\{ (3, 1), \cdots, (3, 6) \}|}{|S|} = \frac{6}{36} \end{aligned}\]

The conditional probability can also be generalized to more than two events using the multiplication rule:

\[\begin{aligned} &\quad P(E_1)P(E_2 \mid E_1)P(E_3 \mid E_1 E_2) \cdots P(E_n \mid E_1 E_2 \cdots E_{n-1}) \\    &= P(E_1) \frac{P(E_1E_2)}{P(E_1)} \frac{P(E_3E_1E_2)}{P(E_1E_2)} \cdots \frac{P(E_nE_1E_2 \cdots E_{n-1})}{P(E_1E_2 \cdots E_{n-1})} \\    &= P(E_1 E_2 \cdots E_n) \tag{1.5.3} \end{aligned}\]

Example 1.5.1


Suppose we have a deck of $52$ cards, and we randomly divided them into $4$ piles of $13$ cards. Compute the probability that each pile has exactly $1$ ace.


There are four houses in a deck of cards: Hearts, Diamonds, Clubs and Spades. We can define events $E_i, i = 1, 2, 3, 4$ as follows

\[\begin{aligned} E_1 &= \{ \text{Ace of Hearts is in any pile} \} \\    E_2 &= \{ \text{Ace of Hearts and Diamonds are in different piles} \} \\    E_3 &= \{ \text{Ace of Hearts, Diamonds and Clubs are in different piles} \} \\    E_4 &= E = \{ \text{All four aces are in different piles} \} \end{aligned}\]

The desired probability is $P(E_4)$.

\[\begin{aligned}    P(E_4) &= P(E_1 E_2 E_3 E_4) \\    &= P(E_1) P(E_2 \mid E_1) P(E_3 \mid E_1E_2) P(E_4 \mid E_1E_2E_3) \\    &= P(E_1) P(E_2 \mid E_1) P(E_3 \mid E_2) P(E_4 \mid E_3) \\\end{aligned}\]

$P(E_1) = 1$ because the ace of Hearts is always going to be in a pile. For $P(E_2 \mid E_1)$, we can calculate the probability of the two cards being in the same pile. As the remaining $12$ cards are equally likely drawn from the deck of $51$ cards,

\[P(E_2 \mid E_1) = 1 - \frac{12}{51} = \frac{39}{51}\]

Given $E_1E_2$, the ace of clubs can't be any of the $24$ cards in the two piles with the two aces.

\[P(E_3 \mid E_2) = 1 - \frac{12 + 12}{50} = \frac{26}{50}\]

and finally we have

\[P(E_4 \mid E_3) = 1 - \frac{12 \times 3}{49} = \frac{13}{49}\]

So $P(E_4) = 1 \cdot \frac{39}{51} \cdot \frac{26}{50} \cdot \frac{13}{49} \approx 0.105$.

Independence of events

In general, the conditional probability $P(E \mid F) \neq P(E)$. We say $E$ and $F$ are independent when $P(E \mid F) = P(E)$. When $E$ is independent of $F$, we also have

\[P(EF) = P(E)P(F)\]

Definition: two events $E$ and $F$ are independent if any of the following holds:

\[\begin{cases} \tag{1.5.4}    P(E \mid F) = P(E) \\    P(F \mid E) = P(F) \\    P(EF) = P(E)P(F)\end{cases}\]

Otherwise we say $E$ and $F$ are dependent. Independence is denoted by $E \perp F$.

Proposition: if $E$ and $F$ are independent, then so are $E$ and $F^C$.

Proof: we need to find $P(EF^C) = P(E)P(F^C)$.

\[\begin{aligned}    E = E \cap S &= E \cap (F \cup F^C) \\    &= (E \cap F) \cup (E \cap F^C) \\    P(E) &= P(EF) + P(EF^C) \\    &= P(E)P(F) + P(EF^C) \\    P(E) - P(E)P(F) &= P(EF^C) \\    P(E)P(F^C) &= P(EF^C)\end{aligned}\]

From this we have

\[E \perp F \Rightarrow E \perp F^C \Rightarrow E^C \perp F \Rightarrow E^C \perp F^C\]

Example 1.5.2


Suppose we have a circuit with $n$ switches in parallel. The probability that the $i$-th component can work is $P_i$, $i = 1, \cdots, n$. What is the probability that the system functions?


Denote $E_i = \{ \text{the i-th component functions} \}$, $P(E_i) = P_i$, and $E_i \perp E_j \, \forall i \neq j$.

\[\begin{aligned}    E &= \{\text{system functions}\} = \{\text{observe at least one } E_i\} \\    E^C &= \{\text{none of the components can work}\} = \{\text{didn't observe any } E_i\} = \bigcap_{i=1}^n{E_i^C} \end{aligned}\]

\[\begin{aligned} P\left(E^C\right) &= P\left(\bigcap_{i=1}^n{E_i^C}\right) = \prod_{i=1}^n{P\left(E_i^C\right)} \\    P(E) &= 1 - P(E^C) = 1 - \prod_{i=1}^n{P\left(E_i^C\right)} \end{aligned}\]

1.6 The law of total probability and Bayes' rule

In one of our previous proofs, we showed a trick to represent the probability of an event

\[P(E) = P(EF) + P(EF^C)\]

because $F \cup F^C = S$ and $FF^C = \emptyset$. Now let's consider a generalization of this. For some positive integer $k$, let the sets $E_1, \cdots, E_k$ be such that

  1. $\bigcup_{i=1}^k{E_i} = S$
  2. $E_i \cap E_j = \emptyset \quad \forall i \neq j$

Then the collection of sets $E_1, \cdots, E_k$ is called a partition of $S$. $F$ and $F^C$ is a partition of $S$ with $k=2$.

Law of total probability


Given $F_1, \cdots, F_k$ as a partition of $S$, such that $P(F_i) > 0$ for $i = 1, \cdots, k$, then for any event $E$, we have

\[P(E) = \sum_{i=1}^k{P(E \mid F_i)P(F_i)}\]


We can rewrite $E$ as its intersection with the sample space

\[\begin{aligned}    E &= E \cap S = E \cap \left( \bigcup_{i=1}^k{F_i} \right) \\    &= (E \cap F_1) \cup (E \cap F_2) \cup\cdots\cup (E \cap F_k) \\    P(E) &= P\left( \bigcup_{i=1}^k{EF_i} \right) \\    &= \sum_{i=1}^k{P(EF_i)} \quad \text{because } EF_i \text{ are pairwise disjoint} \\    &= \sum_{i=1}^k{P(E \mid F_i)P(F_i)}\end{aligned}\]

Example 1.6.1


In a driving behavior survey, $60\%$ are sedan drivers, $30\%$ are SUV drivers, and $10\%$ are other drivers. $40\%, 65\%$ and $55\%$ of sedan, SUV and other drivers have received a citation within the past $3$ years. Suppose each driver can own one type of car, what is the probability that a randomly selected driver received a citation within $3$ years?


$D_1$, $D_2$ and $D_3$ are the events that a random driver drives a sedan, SUV or other car, respectively. Let $Y$ be the event that the driver received a citation within $3$ years.

\[P(Y) = \sum_{i=1}^3{P(Y \mid D_i)P(D_i)}\]

because $D_1, D_2$ and $D_3$ is a partition of $S$.

\[\begin{aligned}    P(D_1) = 0.6 && P(D_2) = 0.3 && P(D_3) = 0.1 \\    P(Y \mid D_1) = 0.4&& P(Y \mid D_2) = 0.65 && P(Y \mid D_3) = 0.55\end{aligned}\]

Therefore $P(Y) = 0.49$.

Bayes' rule

Using the law of total probability, we can derive a simple but very useful result known as the Bayes' rule.


Assume that $F_1, \cdots, F_k$ is a partition of $S$ such that $P(F_i) > 0$ for $i = 1, \cdots, k$, then

\[P(F_i \mid E) = \frac{P(E \mid F_i)P(F_i)}{\sum_{i=1}^k{P(E \mid F_i)P(F_i)}}\]


By definition of conditional probability, $P(E \mid F_i)P(F_i) = P(EF_i)$. By law of total probability, we have $\sum_{i=1}^k{P(E \mid F_i)P(F_i)} = P(E)$. So

\[\frac{P(E \mid F_i) P(F_i)}{\sum_{i=1}^k{P(E \mid F_i) P(F_i)}} = \frac{P(EF_i)}{P(E)} = P(F_i \mid E)\]

If we only have two events, another form of Bayes' rule is

\[P(E \mid F) = \frac{P(F \mid E)P(E)}{P(F)}\]

if $P(F) > 0$ and $P(E) > 0$.

Example 1.6.2


A biomarker was developed to detect a certain kind of gene defect. When this test is applied to a person with this gene defect, it has a probability of $0.9$ to give a positive result. If this test is applied to a person without the defect, there's a probability of $0.05$ for the biomarker to give a false positive result. We know $1\%$ of the total population have this defect. When we apply this to a random person, what are the probabilities of

  1. the test result is negative,
  2. the person has the defect given the test result is positive, and
  3. the person doesn't have this defect given the test result is negative.


$P$ and $N$ are the events of positive and negative results. $W$ and $O$ are the events of with and without this gene defect. We want to find $P(N)$, $P(W \mid P)$ and $P(O \mid N)$ knowing that

\[\begin{cases}    P(P \mid W) = 0.9 \\    P(P \mid O) = 0.05 \\    P(W) = 0.01\end{cases}\]

\[\begin{aligned}    P(N) &= 1 - P(P) \\    &= 1 - (P(P \mid W)P(W) + P(P \mid O)P(O))) \\    &= 1 - (0.9 \times 0.01 + 0.05 \times (1 - 0.01)) \\    &= 1 - 0.0585 = 0.9415 \\    P(W \mid P) &= \frac{P(P \mid W)P(W)}{P(P)} \\    &= \frac{0.9 \times 0.01}{0.0585} = 0.1538 \\    P(O \mid N) &= \frac{P(N \mid O)P(O)}{P(N)} \\    &= \frac{(1 - 0.05) \times (1 - 0.01)}{0.9415} = 0.9989\end{aligned}\]