Convex optimization over a probability simplex

ArXiv ID: 2305.09046 “View on arXiv”

Authors: Unknown

Abstract

We propose a new iteration scheme, the Cauchy-Simplex, to optimize convex problems over the probability simplex ${“w\in\mathbb{R”}^n\ |\ \sum_i w_i=1\ \textrm{“and”}\ w_i\geq0}$. Specifically, we map the simplex to the positive quadrant of a unit sphere, envisage gradient descent in latent variables, and map the result back in a way that only depends on the simplex variable. Moreover, proving rigorous convergence results in this formulation leads inherently to tools from information theory (e.g., cross-entropy and KL divergence). Each iteration of the Cauchy-Simplex consists of simple operations, making it well-suited for high-dimensional problems. In continuous time, we prove that $f(x_T)-f(x^) = {“O”}(1/T)$ for differentiable real-valued convex functions, where $T$ is the number of time steps and $w^$ is the optimal solution. Numerical experiments of projection onto convex hulls show faster convergence than similar algorithms. Finally, we apply our algorithm to online learning problems and prove the convergence of the average regret for (1) Prediction with expert advice and (2) Universal Portfolios.

Keywords: Cauchy-Simplex, Probability Simplex, Online Learning, Universal Portfolios, Convex Optimization, General / Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 5.0/10
  • Quadrant: Holy Grail
  • Why: The paper presents a novel optimization algorithm with heavy mathematical proofs, including convergence rates and connections to information theory, indicating high mathematical complexity. While it includes numerical experiments and applications to online learning, the evidence is primarily theoretical and experimental, lacking the extensive backtesting and data processing typical of high empirical rigor studies.
  flowchart TD
    A["Research Goal: Optimize convex functions<br>over the probability simplex"] --> B["Methodology: Cauchy-Simplex Scheme"]
    
    B --> C["Mapping:<br>Simplex → Positive unit sphere quadrant"]
    C --> D["Computation:<br>Gradient descent in latent variables"]
    D --> E["Mapping:<br>Back to probability simplex"]
    
    E --> F{"Computational Process"}
    F --> G["Iterative optimization<br>constrained to simplex"]
    F --> H["Continuous time analysis<br>O(1/T) convergence rate"]
    
    G --> I["Key Findings"]
    H --> I
    
    I --> J["Theoretical: Convergence proof using<br>information theory metrics"]
    I --> K["Experimental: Faster convergence<br>in convex hull projection"]
    I --> L["Application: Online learning<br>Universal Portfolios & Expert Advice"]