Exploratory Mean-Variance with Jumps: An Equilibrium Approach

ArXiv ID: 2512.09224 “View on arXiv”

Authors: Yuling Max Chen, Bin Li, David Saunders

Abstract

Revisiting the continuous-time Mean-Variance (MV) Portfolio Optimization problem, we model the market dynamics with a jump-diffusion process and apply Reinforcement Learning (RL) techniques to facilitate informed exploration within the control space. We recognize the time-inconsistency of the MV problem and adopt the time-inconsistent control (TIC) approach to analytically solve for an exploratory equilibrium investment policy, which is a Gaussian distribution centered on the equilibrium control of the classical MV problem. Our approach accounts for time-inconsistent preferences and actions, and our equilibrium policy is the best option an investor can take at any given time during the investment period. Moreover, we leverage the martingale properties of the equilibrium policy, design a RL model, and propose an Actor-Critic RL algorithm. All of our RL model parameters converge to the corresponding true values in a simulation study. Our numerical study on 24 years of real market data shows that the proposed RL model is profitable in 13 out of 14 tests, demonstrating its practical applicability in real world investment.

Keywords: Mean-Variance Optimization, Time-Inconsistency, Actor-Critic, Jump-Diffusion Process, Portfolio Optimization, Equities

Complexity vs Empirical Score

  • Math Complexity: 9.0/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper features dense advanced mathematics, including stochastic control theory, jump-diffusion processes, and equilibrium solutions, while also demonstrating strong empirical backing through a simulation study and backtesting on 24 years of real market data.
  flowchart TD
    A["Research Goal:<br>Optimize MV Portfolio with Jumps & RL"] --> B
    subgraph B ["Methodology"]
        B1["Model: Jump-Diffusion<br>Market Dynamics"] --> B2["Solve: Time-Inconsistent<br>Equilibrium Policy"]
        B2 --> B3["Design: Actor-Critic<br>RL Algorithm"]
    end
    B --> C{"Data & Computation"}
    C --> D["Simulation Study:<br>Convergence of RL Parameters"]
    C --> E["Real Market Data<br>24 Years Equity Data"]
    E --> F["Performance:<br>13/14 Tests Profitable"]
    D --> F
    F --> G["Outcome:<br>Verified Equilibrium Policy &<br>Practical Investment Tool"]