Statistics of Optimal Transport and Genetic Model
Back to: Input MOC
Source:
2024-06-04 Course Notes
This note recorded the essential information of the data descriptive science summer school (?).
Notes
Introduction
ML and AI techs
- Build smarter/better algorithms
- Better controls on probability
- General neural network
How to control probability
- e.g., in simple regression a
- newly release methods: OT/GM
What is Statistics
In this course: Analysis on the probability-features with limited observations and inferences
, P is the event under observed, is the interests - In the perspectives on observation, inference based on observation
- What we interest the most is,
, in other words, the distribution of the error
Introduction to Optimal Transport
Consider OT in distribution way
- Distribution
transport into another distribution - e.g., discrete distributes: transport partly density of probability from
to - e.g., point clouds: point in
, systematically transport points into other coordinates
- e.g., discrete distributes: transport partly density of probability from
Definitions
- Distance space:
- Probability measurement:
- Transposition:
- Transport:
Monge's Question
- how to minimize the cost of transport?
- We could find the best transposition most of time
Kantorovichi's problem
- after coupling of
, , find the transport costs less - by using Wasserstein distance (
),
Advantages of Wasserstein distance
- Information with noise might involve Support, which means this information concrete into a narrow space
- by using W distance, Support could be overcome
Statistics of OT
Core: with observations
Definitions
Curse of Dimensionality
- error of W distance becomes larger when dimensionality increases
- thus, W distance won't be used in the case of high-dimension data
Approach avoidance on curse of dimensionality
Method 1: Slicing, which projects data into a straight line.
- Alternative of Method 1: find maximum of the direction of slicing,
- Disadvantages:
- misunderstand some traits of original distribution
- might miss partly information
Method 2: force on the normalization of entropy
- relatively formulation on entropy
- Entropic W distance
why we need to know error distribution
- because we want to know the reliability of the calculation
- after knowing the error distribution, it could be more confidence on the
value and s - in all, it is important to know how far from the calculated value to the true value
one-variable or discrete variable
- approximated distribution of inferred error
It is difficult to infer multiple variable
- suggestion: the upper bound of the subtraction between the average of data and expectation
- Covering number (why introduce that?)
- Find a representation of original function by using
Inferred error of OT projection
- ...
Apply it to the neural network
- it could only be applied in some specific conditions
- ...
On Genetic Model
- information in this section do not need recording
References
Weed, J., & Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25(4 A), 2620-2648.