Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning

ArXiv ID: 2312.15385 “View on arXiv”

Authors: Unknown

Abstract

This paper studies a discrete-time mean-variance model based on reinforcement learning. Compared with its continuous-time counterpart in \cite{“zhou2020mv”}, the discrete-time model makes more general assumptions about the asset’s return distribution. Using entropy to measure the cost of exploration, we derive the optimal investment strategy, whose density function is also Gaussian type. Additionally, we design the corresponding reinforcement learning algorithm. Both simulation experiments and empirical analysis indicate that our discrete-time model exhibits better applicability when analyzing real-world data than the continuous-time model.

Keywords: mean-variance model, reinforcement learning, entropy regularization, optimal investment strategy, General Financial Markets

Complexity vs Empirical Score

Math Complexity: 8.0/10
Empirical Rigor: 6.5/10
Quadrant: Holy Grail
Why: The paper employs dense stochastic control and reinforcement learning theory, including derivations of Gaussian optimal policies and convergence proofs (high math). It validates the model with both simulation experiments and an empirical backtest on S&P 500 data, demonstrating a focus on real-world applicability (moderate-to-high rigor).

  flowchart TD
    A["Research Goal:<br>Discrete-Time Mean-Variance Model<br>Based on Reinforcement Learning"] --> B["Methodology: Derive Optimal Strategy"]
    B --> C["Methodology: Design RL Algorithm<br>with Entropy Regularization"]
    C --> D["Data: Simulation &<br>Empirical Market Data"]
    D --> E["Computation: Apply Model & Algorithm"]
    E --> F["Outcomes: Gaussian Type<br>Optimal Investment Density"]
    E --> G["Outcomes: Better Applicability<br>vs. Continuous-Time Models"]

Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning#

Abstract#

Complexity vs Empirical Score#

Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning

Abstract

Complexity vs Empirical Score