Statistics of Optimal Transport and Genetic Model

#input #course #notes

Back to: Input MOC

Source:

2024-06-04 Course Notes

This note recorded the essential information of the data descriptive science summer school (?).

Notes

Introduction

ML and AI techs

Build smarter/better algorithms
Better controls on probability
General neural network

How to control probability

e.g., in simple regression a $ϵ$
newly release methods: OT/GM

What is Statistics

In this course: Analysis on the probability-features with limited observations and inferences

$θ = ψ (P)$ , P is the event under observed, $θ$ is the interests
In the perspectives on observation, inference based on observation
What we interest the most is, $\hat{θ} - θ$ , in other words, the distribution of the error

Introduction to Optimal Transport

Consider OT in distribution way

Distribution $P$ transport into another distribution $Q$
- e.g., discrete distributes: transport partly density of probability from $X_{3}$ to $X_{3}$
- e.g., point clouds: point in $R^{2}$ , systematically transport points into other coordinates

Definitions

Distance space: $Ω$
Probability measurement: $P, Q$
Transposition: $T$
Transport: $Q = T # P$

Monge's Question

how to minimize the cost of transport?
We could find the best transposition most of time

Kantorovichi's problem

after coupling of $P$ , $Q$ , find the transport costs less
by using Wasserstein distance ( $p > 1$ ), $W_{p} (P, Q)$

Advantages of Wasserstein distance

Information with noise might involve Support, which means this information concrete into a narrow space
by using W distance, Support could be overcome

Statistics of OT

Core: with observations $X_{1} . . . X_{n}$ of $P$ , how to infer the $W (P, Q)$ , in the case of $P$ is unknown, while $Q$ is known.
Definitions

$Ω = [0, 1]^{d}$

Curse of Dimensionality

error of W distance becomes larger when dimensionality increases
thus, W distance won't be used in the case of high-dimension data

Approach avoidance on curse of dimensionality

Method 1: Slicing, which projects data into a straight line. $S W_{P} (P, Q)$

Alternative of Method 1: find maximum of the direction of slicing, $M S W_{P} (P, Q)$
Disadvantages:
- misunderstand some traits of original distribution
- might miss partly information

Method 2: force on the normalization of entropy

relatively formulation on entropy
Entropic W distance

why we need to know error distribution

because we want to know the reliability of the calculation
after knowing the error distribution, it could be more confidence on the $p$ value and $C I$ s
in all, it is important to know how far from the calculated value to the true value

one-variable or discrete variable

approximated distribution of inferred error

It is difficult to infer multiple variable

suggestion: the upper bound of the subtraction between the average of data and expectation
Covering number (why introduce that?)
Find a representation of original function by using $L i p (X)$

Inferred error of OT projection

Apply it to the neural network

it could only be applied in some specific conditions
...

On Genetic Model

information in this section do not need recording

References

Weed, J., & Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25(4 A), 2620-2648.