Causal Inference The Mixtape-CH02

#input #books #notes

Source: Causal Inference The Mixtape (scunning.com)

2024-04-25 Book Notes

The chapter 2 is a review on probability and regression. Not a harsh one, anyone who have already learned statistics should get used to all the contents. I would like to take some notes of relatively new perspectives on understanding probability.

Notes

On probability theory

There are two kinds of random processes: Discrete and Continuous
There are two ways to define independent events:
- Logical independence: e.g., two events occur, but there is no reason to believe that two events affect each other.
- statistical independence: for independent event, $P r (A, B) = P r (A) P r (B)$

Terminology check

Event: some could be observed with two status of occur and not occur, e.g., A vs. ~A ( $\neg A$ )
Conditional probabilities: for multiple events, all the combination of the possible status of the events
⇾ We could represent events in a probability tree to illustrate all the possibility

Venn diagrams and sets

Need no reviews

Contingency table

Need no reviews (example table as following)

Event labels	$\neg B$	$B$	Total
$A$	0.1	0.5	0.6
$\neg A$	0.1	0.3	0.4
Total	0.2	0.8	1.0

Reviews on so-called Bayes's rule:
Let do this trick to get the decomposition

Given two events, $A$ and $B$ :

P r (A | B) = \frac{P r (A, B)}{P r (B)} \to P r (A, B) = P r (A | B) P r (B)

P r (B | A) = \frac{P r (B, A)}{P r (A)} \to P r (B, A) = P r (B | A) P r (A)

Because of the equation of $P r (A, B)$ and $P r (B, A)$ , we could get:

P r (A | B) P r (B) = P r (B | A) P r (A) \to P r (A | B) = \frac{P r (B | A) P r (A)}{P r (B)}

The last puzzle of the decomposition is $P r (B) = P r (A, B) + P r (\neg A, B)$ , then we could easily get:

P r (A | B) = \frac{P r (B | A) P r (A)}{P r (B | A) P r (A) + P r (B | \neg A) P r (\neg A)}

Monty Hall example

Actually, I do feel like most people that find Monty Hall problem counterintuitive. Just note the process of it, the formalization of the problem might help.

Assume that participant chose door 1 and there are a million dollars behind door 1 is event $A_{1}$ , thus $P r (A_{1}) = 1 / 3$ , $P r (\neg A) = 2 / 3$ . And then assume that Monty Hall opened door 2, and reveal a goat. Let opening door 2 is event $B$ .

Here, the marginal probability that without any additional information is 1/3. This is so-called prior probability, or prior belief. Therefore, $P r (B | A_{i})$ is the conditional probability, it represents that when event $A_{i}$ happened, what's the probability that Monty Hall would open door 2.

Let us write down after Monty Hall opened door 2, and it revealed a goat, the probability of have a million dollars behind door 1.

P r (A_{1} | B) = \frac{P r (B | A_{1}) P r (A_{1})}{P r (B | A_{1}) P r (A_{1}) + P r (B | A_{2}) P r (A_{2}) + P r (B | A_{3}) P r (A_{3})}

Here comes into the most interesting part, let's figure out each probability of each assumed event. Without any daunts, $P (A_{i})$ should be equally 1/3 as they are, all prior one. In fact, it is always ignored that Monty Hall do know in which door there is a million dollars behind. Thus, if there is a million dollars behind door 1, Monty Hall would like to take a little trick that showing a goat revealed behind any rest door, leading that $P r (B | A_{1}) = 0.5$ as the probability of the status of a million dollars behind door 1 and Monty Hall opened door 2. Well done, what if the prize is behind door 2? The answer is simple, Monty Hall would never open door 2 (event $B$ ) because he actually knows where a million is and $P r (B | A_{2}) = 0$ . Finally, because the participant has already chosen door 1, there is only one option for Monty Hall, to open door 2. If he opens door 3 and a million appears, game over for everyone! Thus, we could know that $P r (B | A_{3}) = 1$ . And let us calculate:

P r (A_{1} | B) = \frac{1 / 2 \cdot 1 / 3}{1 / 2 \cdot 1 / 3 + 0 \cdot 1 / 3 + 1 \cdot 1 / 3} = \frac{1 / 6}{1 / 6 + 2 / 6} = \frac{1}{3}

Okay, nothing changed. You still hold the equal probability as the first moment you picked up door 1. It means that even you have witnessed event $B$ , and you still keep your choice, the probability won't change. However, if you changed your mind, after witnessed event $B$ , you want door 3 right now? $P r (A_{3} | B) = 2 / 3$ , because we change the numerator from $1 / 2 \cdot 1 / 3$ into $1 / 3$ .

How this counterintuitive result comes? It could be explained by the asymmetric probability inference via explaining the behavior of Monty Hall. It is difficult to take the perspective of Monty Hall during a limited time. And of course, abstraction of this question sometimes let to incorrect assumes. I would like to write another note to review the process of my own inference, including an incorrect path. Do remember that after witness Monty Hall's trick, do change your option might lead to a million dollars award.

Summation operator

One useful result of $\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} = \sum_{i = 1}^{n} x_{i}^{2} - n (\bar{x})^{2}$ . Let us prove it step-by-step.

\begin{aligned} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} & = \sum_{i = 1}^{n} (x_{i}^{2} - 2 x_{i} \bar{x} + {\bar{x}}^{2}) \\ = \sum_{i = 1}^{n} x_{i}^{2} - 2 \bar{x} \sum_{i = 1}^{n} x_{i} + n {\bar{x}}^{2} \\ = \sum_{i = 1}^{n} x_{i}^{2} - 2 \frac{1}{n} \sum_{i = 1}^{n} x_{i} \sum_{i = 1}^{n} x_{i} + n (\frac{1}{n} \sum_{i = 1}^{n} x_{i})^{2} \\ = \sum_{i = 1}^{n} x_{i}^{2} - 2 \frac{1}{n} (\sum_{i = 1}^{n} x_{i})^{2} + \frac{1}{n} (\sum_{i = 1}^{n} x_{i})^{2} \\ = \sum_{i = 1}^{n} x_{i}^{2} - \frac{1}{n} (\sum_{i = 1}^{n} x_{i})^{2} \\ = \sum_{i = 1}^{n} x_{i}^{2} - \frac{1}{n} (n \frac{1}{n} \sum_{i = 1}^{n} x_{i})^{2} \\ = \sum_{i = 1}^{n} x_{i}^{2} - n {\bar{x}}^{2} \end{aligned}

Another more general one: $\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y}) = \sum_{i = 1}^{n} x_{i} y_{i} - n \bar{x} \bar{y}$
A simple proof is below:

\begin{aligned} \sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y}) & = \sum_{i = 1}^{n} (x_{i} y_{i} - x_{i} \bar{y} - \bar{x} y_{i} + \bar{x} \bar{y}) \\ = \sum_{i = 1}^{n} x_{i} y_{i} + n \bar{x} \bar{y} - \sum_{i = 1}^{n} x_{i} \bar{y} - \sum_{i = 1}^{n} y_{i} \bar{x} \\ = \sum_{i = 1}^{n} x_{i} y_{i} + n \bar{x} \bar{y} - \bar{y} \sum_{i = 1}^{n} x_{i} - \bar{x} \sum_{i = 1}^{n} y_{i} \\ = \sum_{i = 1}^{n} x_{i} y_{i} + n \bar{x} \bar{y} - \bar{y} n \bar{x} - \bar{x} n \bar{y} \\ = \sum_{i = 1}^{n} x_{i} y_{i} - n \bar{x} \bar{y} \end{aligned}

Expected value

Need no reviews

Simply regard it as a weighted average is ok.

E (X) = \sum_{j = 1}^{k} x_{j} f (x_{j})

Variance

Have to reintroduce the concept of variance again with other perspective
My understanding of variance is quite intuitive, which refers to the mean squared distance between any value to their expected value.
However, it is not enough, thus I have to reconstruct it

I think I missed one or two assumptions before to conceptualize a variance. Especially, I totally miss the assuming on $E (W - E (W)) = 0$ . The total steps are the following. Assuming a random variable $W$ , and any lineal transformation of it could be presented as $a W + b$ , for any constants, $a, b$ . We could give the expected value of $W$ as $E (W)$ , and we also assume that the expected value of the distance from each value of $W$ to its expected value is also equal to 0, $E (W - E (W)) = 0$ .

Then we define $V (W) = σ^{2} = E [(W - E (W))^{2}]$ , in fact, my previous understanding did capture the geometry of conceptualization of variance, unfortunately, not algebraically. Then, we could get $V (W) = E (W^{2}) - E (W)^{2}$ .

Also, remind that the variance of any lineal transformation of $W$ could be expressed as $V (a W + b) = a^{2} V (W)$ .

Covariance

Just remind the conceptualization of covariance (for example, of $X, Y$ ) is the difference between expected production of each pair of variables from $X$ and $Y$ and the production of expected values of $X, Y$ . If $X$ and $Y$ are independent, $C (X, Y) = 0$ .

\begin{aligned} C (X, Y) & = E (X Y) - E (X) E (Y) \\ = E ((X - E (X)) (Y - E (Y))) \end{aligned}

Also, correlation should be mentioned here, the conception of it could be captured as the ratio of covariance of $X$ and $Y$ on the root of production of variance of $X$ and $Y$ as following:

C o r r (X, Y) = \frac{C (X, Y)}{\sqrt V (X) V (Y)}

Population model

This section should be well reviewed.

References

Causal Inference The Mixtape - 2 Probability and Regression Review (scunning.com)