Causal Inference The Mixtape-CH02

Back to: Input MOC

Source: Causal Inference The Mixtape (scunning.com)

2024-04-25 Book Notes

The chapter 2 is a review on probability and regression. Not a harsh one, anyone who have already learned statistics should get used to all the contents. I would like to take some notes of relatively new perspectives on understanding probability.

Notes

On probability theory

Terminology check

Venn diagrams and sets

Contingency table

Event labels ¬B B Total
A 0.1 0.5 0.6
¬A 0.1 0.3 0.4
Total 0.2 0.8 1.0

Given two events, A and B:

Pr(A|B)=Pr(A,B)Pr(B)Pr(A,B)=Pr(A|B)Pr(B)Pr(B|A)=Pr(B,A)Pr(A)Pr(B,A)=Pr(B|A)Pr(A)

Because of the equation of Pr(A,B) and Pr(B,A), we could get:

Pr(A|B)Pr(B)=Pr(B|A)Pr(A)Pr(A|B)=Pr(B|A)Pr(A)Pr(B)

The last puzzle of the decomposition is Pr(B)=Pr(A,B)+Pr(¬A,B), then we could easily get:

Pr(A|B)=Pr(B|A)Pr(A)Pr(B|A)Pr(A)+Pr(B|¬A)Pr(¬A)

Monty Hall example

Actually, I do feel like most people that find Monty Hall problem counterintuitive. Just note the process of it, the formalization of the problem might help.

Assume that participant chose door 1 and there are a million dollars behind door 1 is event A1, thus Pr(A1)=1/3, Pr(¬A)=2/3. And then assume that Monty Hall opened door 2, and reveal a goat. Let opening door 2 is event B.

Here, the marginal probability that without any additional information is 1/3. This is so-called prior probability, or prior belief. Therefore, Pr(B|Ai) is the conditional probability, it represents that when event Ai happened, what's the probability that Monty Hall would open door 2.

Let us write down after Monty Hall opened door 2, and it revealed a goat, the probability of have a million dollars behind door 1.

Pr(A1|B)=Pr(B|A1)Pr(A1)Pr(B|A1)Pr(A1)+Pr(B|A2)Pr(A2)+Pr(B|A3)Pr(A3)

Here comes into the most interesting part, let's figure out each probability of each assumed event. Without any daunts, P(Ai) should be equally 1/3 as they are, all prior one. In fact, it is always ignored that Monty Hall do know in which door there is a million dollars behind. Thus, if there is a million dollars behind door 1, Monty Hall would like to take a little trick that showing a goat revealed behind any rest door, leading that Pr(B|A1)=0.5 as the probability of the status of a million dollars behind door 1 and Monty Hall opened door 2. Well done, what if the prize is behind door 2? The answer is simple, Monty Hall would never open door 2 (event B) because he actually knows where a million is and Pr(B|A2)=0. Finally, because the participant has already chosen door 1, there is only one option for Monty Hall, to open door 2. If he opens door 3 and a million appears, game over for everyone! Thus, we could know that Pr(B|A3)=1. And let us calculate:

Pr(A1|B)=1/21/31/21/3+01/3+11/3=1/61/6+2/6=13

Okay, nothing changed. You still hold the equal probability as the first moment you picked up door 1. It means that even you have witnessed event B, and you still keep your choice, the probability won't change. However, if you changed your mind, after witnessed event B, you want door 3 right now? Pr(A3|B)=2/3, because we change the numerator from 1/21/3 into 1/3.

How this counterintuitive result comes? It could be explained by the asymmetric probability inference via explaining the behavior of Monty Hall. It is difficult to take the perspective of Monty Hall during a limited time. And of course, abstraction of this question sometimes let to incorrect assumes. I would like to write another note to review the process of my own inference, including an incorrect path. Do remember that after witness Monty Hall's trick, do change your option might lead to a million dollars award.

Summation operator

One useful result of i=1n(xix¯)2=i=1nxi2n(x¯)2. Let us prove it step-by-step.

i=1n(xix¯)2=i=1n(xi22xix¯+x¯2)=i=1nxi22x¯i=1nxi+nx¯2=i=1nxi221ni=1nxii=1nxi+n(1ni=1nxi)2=i=1nxi221n(i=1nxi)2+1n(i=1nxi)2=i=1nxi21n(i=1nxi)2=i=1nxi21n(n1ni=1nxi)2=i=1nxi2nx¯2

Another more general one: i=1n(xix¯)(yiy¯)=i=1nxiyinx¯y¯
A simple proof is below:

i=1n(xix¯)(yiy¯)=i=1n(xiyixiy¯x¯yi+x¯y¯)=i=1nxiyi+nx¯y¯i=1nxiy¯i=1nyix¯=i=1nxiyi+nx¯y¯y¯i=1nxix¯i=1nyi=i=1nxiyi+nx¯y¯y¯nx¯x¯ny¯=i=1nxiyinx¯y¯

Expected value

Simply regard it as a weighted average is ok.

E(X)=j=1kxjf(xj)

Variance

I think I missed one or two assumptions before to conceptualize a variance. Especially, I totally miss the assuming on E(WE(W))=0. The total steps are the following. Assuming a random variable W, and any lineal transformation of it could be presented as aW+b, for any constants, a,b. We could give the expected value of W as E(W), and we also assume that the expected value of the distance from each value of W to its expected value is also equal to 0, E(WE(W))=0.

Then we define V(W)=σ2=E[(WE(W))2], in fact, my previous understanding did capture the geometry of conceptualization of variance, unfortunately, not algebraically. Then, we could get V(W)=E(W2)E(W)2.

Also, remind that the variance of any lineal transformation of W could be expressed as V(aW+b)=a2V(W).

Covariance

Just remind the conceptualization of covariance (for example, of X,Y) is the difference between expected production of each pair of variables from X and Y and the production of expected values of X,Y. If X and Y are independent, C(X,Y)=0.

C(X,Y)=E(XY)E(X)E(Y)=E((XE(X))(YE(Y)))

Also, correlation should be mentioned here, the conception of it could be captured as the ratio of covariance of X and Y on the root of production of variance of X and Y as following:

Corr(X,Y)=C(X,Y)V(X)V(Y)

Population model

This section should be well reviewed.

References

Causal Inference The Mixtape - 2  Probability and Regression Review (scunning.com)