Optimal Transport Notes 1
Wed Jan 8, 2020 · 716 words · 4 min

Recall $$V={ 1,2,n },V^1={1^1,2^1,n^1}$$ be a complete bipartite graph. $V(a,b)$ be a polytope of $m\times n$ matrices $P$\ s.t. $P1_n = 1$, $P^T 1_m = b$. Let our cost be $L_c(P) = <P,C>$

Theorem: Let P be an extreme point in $U(a,b)$. Consider the subgraph $(V\bigcup V^1, F(P))$ where $F(P) = {{i,j}:P_{i,j} > 0}$. Then this subgraph has no cycles.

In particular, at most $m+n-1$ many edges can have positive mass under $P$.

Special case when $m=n$ and $a=b=\frac{1}{n}1$, $U(a,b)$ = polytope of double stochastic matrices, sometimes called Birkoff-Von-Neumann polytope. We know that the extreme points are $\frac{1}{n} Permutation \ Matrices$.

Proof: Suppose $P$ is in $U(a,b)$ such that $G(P) = (V\bigcup V^1, F(P))$ has a cycle. In other words our objective is to find $Q,P \ne P$ where $P = \frac{1}{2}(Q+R)$. There exists edges ${i_1,j_1^l}$,..., ${i_k,j_k^l}$ in $F(p)$. Consider the directed edges $\bar{H}={(i_1,j_1^l)...}$. Choose $\epsilon \le min_{(i,j)\in F(p)} P_{i,j}$

Let $E$ be the matrix such that $E_{i,j} = \epsilon$ if $i\rightarrow j \in \bar{H}$ is equal to $-\epsilon$ if $j\rightarrow i \in \bar{H} = 0$ otherwise. Then $E1 = 0, E^T1 = 0$. Let $Q = P + E, Q \ge 0$ and $R = P - E, R \ge 0$. Thus $Q,R \in U(a,b)$. $P = \frac{1}{2} (Q+R)$

From combinatorics: If a graph has $k$ vertices and no cycles it can have at most $k-1$ edges.

$G(P)$ has $m+n$ vertices so $\le m + n - 1$ cycles.

OT is a linear programming problem. Theoretically can be solved by Simplex/Interior point methods. Unfortuantly doesnt really work well. Why? $m=n$ computational complexity of simplex algo $\ge O(n^3)$

## Entropic Reguralization

Idea is to reguralize the problem to get approximate solutions.

### Entropy

Suppose $p$ is a probability vector $p=(P-1,...,p_n)$, probabilities are greater than 0 while summing up to one. Define the entropy: $$H(p) = \sum_{i=1}^m p_i \log(p_i)$$ Note that this is strictly convex. $\nabla H(p) = \log(p)+1, \nabla^2 H(p) = Diag(\frac{1}{p}) > I$. And therefore the modified problem/reguraliized problem has a unique solution.

For any $\epsilon>0$ consider the following $$L_c^{\epsilon}(a,b) = \min_{U(a,b)} \left[<C,P> + \epsilon H(p)\right]$$ It has a unique solution in the interior of the $U(a,b)$

Theorem: Let $P_{\epsilon}^l$ be the solution of $L_c^{\epsilon}(a,b)$. Then $lim_{\epsilon \rightarrow 0^+} P_{\epsilon}^l = P^l$ while $lim_{\epsilon \rightarrow \infty} P_{\epsilon}^l = ab^t$.

Proof (needs fixing, will be updated in notes): take any $P$ in the interior of $U(a,b). H(p) < \infty$. Then $<c,P_{\epsilon}^l> + \epsilon H(P_{\epsilon}^l) \le <c,p>+\epsilon H(p)$.

Similar argument can be made as $\epsilon \rightarrow \infty$. $P_{\epsilon}^l = argmin \left[\frac{1}{e} <C,P> + H(P)\right]$ then $P^l = argmin \left[H(p)\right]$.

The next question to as is how do the solutions look like for: $L_c^{\epsilon}(a,b) = \min_{U(a,b)} \left[<C,P> + \epsilon H(p)\right]$ Theorem: Let $k_{\epsilon}$ be the matrix $(k_{\epsilon}){i,j} = e^{-C_{i,j}/\epsilon}$. Then $P_{\epsilon}^l$ must be of the form $Diag(u)k_e Diag(v)$ for some $u,v$ non negative vectors.

### Relative Entropy

Can we represent this through the KL divergence (aka relative entropy)? Suppose $p$ is a probability vector $p=(P-1,...,p_n)$, probabilities are greater than 0 while summing up to one, and $q$ is a probability vector $q=(q_1,...,p_q)$. Define the relative entropy: $$H(P|q) = \sum p_i \log(\frac{p_i}{q_i})$$

If q is just the vectors of ones, then relative entropy is just the entropy. Considering the matrix from above $$H(P|k_{\epsilon}) = \sum \sum P_{(i,j)} \log(\frac{P_{(i,j)}}{k_{\epsilon}(i,j)})$$

So then $P_{\epsilon}^l$ must be the argmin of the relative entropy noted above.