false

Myopic Optimality: why reinforcement learning portfolio management strategies lose money

Myopic Optimality: why reinforcement learning portfolio management strategies lose money ArXiv ID: 2509.12764 “View on arXiv” Authors: Yuming Ma Abstract Myopic optimization (MO) outperforms reinforcement learning (RL) in portfolio management: RL yields lower or negative returns, higher variance, larger costs, heavier CVaR, lower profitability, and greater model risk. We model execution/liquidation frictions with mark-to-market accounting. Using Malliavin calculus (Clark-Ocone/BEL), we derive policy gradients and risk shadow price, unifying HJB and KKT. This gives dual gap and convergence results: geometric MO vs. RL floors. We quantify phantom profit in RL via Malliavin policy-gradient contamination analysis and define a control-affects-dynamics (CAD) premium of RL indicating plausibly positive. ...

September 16, 2025 · 2 min · Research Team