false

Dynamic Inverse Optimization under Drift and Shocks: Theory, Regret Bounds, and Applications

Dynamic Inverse Optimization under Drift and Shocks: Theory, Regret Bounds, and Applications ArXiv ID: 2509.14080 “View on arXiv” Authors: JINHO CHA Abstract The growing prevalence of drift and shocks in modern decision environments exposes a gap between classical optimization theory and real-world practice. Standard models assume fixed objectives, yet organizations from hospitals to power grids routinely adapt to shifting priorities, noisy data, and abrupt disruptions. To address this gap, this study develops a dynamic inverse optimization framework that recovers hidden, time-varying preferences from observed allocation trajectories. The framework unifies identifiability analysis with regret guarantees conditions are established for existence and uniqueness of recovered parameters, and sharp static and dynamic regret bounds are derived to characterize responsiveness to gradual drift and sudden shocks. Methodologically, a drift-aware estimator grounded in convex analysis and online learning theory is introduced, with finite-sample guarantees on recovery accuracy. Computational experiments in healthcare, energy, logistics, and finance reveal heterogeneous recovery patterns, ranging from rapid resilience to persistent vulnerability. Overall, dynamic inverse optimization emerges as both a theoretical contribution and a broadly applicable diagnostic tool for benchmarking resilience, uncovering hidden behavioral shifts, and guiding policy interventions in complex stochastic systems. ...

September 17, 2025 · 2 min · Research Team

Mean-Variance Portfolio Selection in Long-Term Investments with Unknown Distribution: Online Estimation, Risk Aversion under Ambiguity, and Universality of Algorithms

Mean-Variance Portfolio Selection in Long-Term Investments with Unknown Distribution: Online Estimation, Risk Aversion under Ambiguity, and Universality of Algorithms ArXiv ID: 2406.13486 “View on arXiv” Authors: Unknown Abstract The standard approach for constructing a Mean-Variance portfolio involves estimating parameters for the model using collected samples. However, since the distribution of future data may not resemble that of the training set, the out-of-sample performance of the estimated portfolio is worse than one derived with true parameters, which has prompted several innovations for better estimation. Instead of treating the data without a timing aspect as in the common training-backtest approach, this paper adopts a perspective where data gradually and continuously reveal over time. The original model is recast into an online learning framework, which is free from any statistical assumptions, to propose a dynamic strategy of sequential portfolios such that its empirical utility, Sharpe ratio, and growth rate asymptotically achieve those of the true portfolio, derived with perfect knowledge of the future data. When the distribution of future data follows a normal shape, the growth rate of wealth is shown to increase by lifting the portfolio along the efficient frontier through the calibration of risk aversion. Since risk aversion cannot be appropriately predetermined, another proposed algorithm updates this coefficient over time, forming a dynamic strategy that approaches the optimal empirical Sharpe ratio or growth rate associated with the true coefficient. The performance of these proposed strategies can be universally guaranteed under stationary stochastic markets. Furthermore, in certain time-reversible stochastic markets, the so-called Bayesian strategy utilizing true conditional distributions, based on past market information during investment, does not perform better than the proposed strategies in terms of empirical utility, Sharpe ratio, or growth rate, which, in contrast, do not rely on conditional distributions. ...

June 19, 2024 · 2 min · Research Team

A Parametric Contextual Online Learning Theory of Brokerage

A Parametric Contextual Online Learning Theory of Brokerage ArXiv ID: 2407.01566 “View on arXiv” Authors: Unknown Abstract We study the role of contextual information in the online learning problem of brokerage between traders. In this sequential problem, at each time step, two traders arrive with secret valuations about an asset they wish to trade. The learner (a broker) suggests a trading (or brokerage) price based on contextual data about the asset and the market conditions. Then, the traders reveal their willingness to buy or sell based on whether their valuations are higher or lower than the brokerage price. A trade occurs if one of the two traders decides to buy and the other to sell, i.e., if the broker’s proposed price falls between the smallest and the largest of their two valuations. We design algorithms for this problem and prove optimal theoretical regret guarantees under various standard assumptions. ...

May 22, 2024 · 2 min · Research Team

Trading Volume Maximization with Online Learning

Trading Volume Maximization with Online Learning ArXiv ID: 2405.13102 “View on arXiv” Authors: Unknown Abstract We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders’ total earnings by maximizing the gain from trade, defined as the sum of the traders’ net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders’ valuations as an i.i.d. process with an unknown distribution. If the traders’ valuations are revealed after each interaction (full-feedback), and the traders’ valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders’ valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders’ valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $Θ(\sqrt{“T”})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case. ...

May 21, 2024 · 3 min · Research Team

$ε$-Policy Gradient for Online Pricing

$ε$-Policy Gradient for Online Pricing ArXiv ID: 2405.03624 “View on arXiv” Authors: Unknown Abstract Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $ε$-policy gradient algorithm for the online pricing learning task. The algorithm extends $ε$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the exploration cost in terms of the exploration probability $ε$ and the exploitation cost in terms of the gradient descent optimization and gradient estimation errors. The algorithm achieves an expected regret of order $\mathcal{“O”}(\sqrt{“T”})$ (up to a logarithmic factor) over $T$ trials. ...

May 6, 2024 · 2 min · Research Team

Detecting Toxic Flow

Detecting Toxic Flow ArXiv ID: 2312.05827 “View on arXiv” Authors: Unknown Abstract This paper develops a framework to predict toxic trades that a broker receives from her clients. Toxic trades are predicted with a novel online learning Bayesian method which we call the projection-based unification of last-layer and subspace estimation (PULSE). PULSE is a fast and statistically-efficient Bayesian procedure for online training of neural networks. We employ a proprietary dataset of foreign exchange transactions to test our methodology. Neural networks trained with PULSE outperform standard machine learning and statistical methods when predicting if a trade will be toxic; the benchmark methods are logistic regression, random forests, and a recursively-updated maximum-likelihood estimator. We devise a strategy for the broker who uses toxicity predictions to internalise or to externalise each trade received from her clients. Our methodology can be implemented in real-time because it takes less than one millisecond to update parameters and make a prediction. Compared with the benchmarks, online learning of a neural network with PULSE attains the highest PnL and avoids the most losses by externalising toxic trades. ...

December 10, 2023 · 2 min · Research Team

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE ArXiv ID: 2306.02848 “View on arXiv” Authors: Unknown Abstract Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to build a factor model that can conduct stock prediction in an online and adaptive setting, where the model can adapt itself to match the current market regime identified based on only point-in-time market information. To tackle this problem, we propose the first deep learning based online and adaptive factor model, HireVAE, at the core of which is a hierarchical latent space that embeds the underlying relationship between the market situation and stock-wise latent factors, so that HireVAE can effectively estimate useful latent factors given only historical market information and subsequently predict accurate stock returns. Across four commonly used real stock market benchmarks, the proposed HireVAE demonstrate superior performance in terms of active returns over previous methods, verifying the potential of such online and adaptive factor model. ...

June 5, 2023 · 2 min · Research Team

Convex optimization over a probability simplex

Convex optimization over a probability simplex ArXiv ID: 2305.09046 “View on arXiv” Authors: Unknown Abstract We propose a new iteration scheme, the Cauchy-Simplex, to optimize convex problems over the probability simplex ${“w\in\mathbb{R”}^n\ |\ \sum_i w_i=1\ \textrm{“and”}\ w_i\geq0}$. Specifically, we map the simplex to the positive quadrant of a unit sphere, envisage gradient descent in latent variables, and map the result back in a way that only depends on the simplex variable. Moreover, proving rigorous convergence results in this formulation leads inherently to tools from information theory (e.g., cross-entropy and KL divergence). Each iteration of the Cauchy-Simplex consists of simple operations, making it well-suited for high-dimensional problems. In continuous time, we prove that $f(x_T)-f(x^) = {“O”}(1/T)$ for differentiable real-valued convex functions, where $T$ is the number of time steps and $w^$ is the optimal solution. Numerical experiments of projection onto convex hulls show faster convergence than similar algorithms. Finally, we apply our algorithm to online learning problems and prove the convergence of the average regret for (1) Prediction with expert advice and (2) Universal Portfolios. ...

May 15, 2023 · 2 min · Research Team