Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

ArXiv ID: 2405.05449 “View on arXiv”

Authors: Unknown

Abstract

Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz’s portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model’s superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.

Keywords: portfolio optimization, reinforcement learning, knowledge distillation, DDPG, risk management

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Holy Grail
  • Why: The paper presents substantial mathematical formalisms including Markowitz optimization, MDPs, and continuous RL algorithms (DDPG), while demonstrating empirical rigor through comparative analysis using financial metrics like Sharpe ratio, though it lacks detailed implementation code or raw dataset disclosures.
  flowchart TD
    A["Research Goal<br>Optimize portfolio return & risk"]
    B["Data Inputs<br>Historical market data & asset prices"]
    C["Methodology<br>Two-stage KDD training"]
    D["Stage 1<br>Supervised learning (Markowitz reference)"]
    E["Stage 2<br>Reinforcement learning (DDPG agent)"]
    F["Computational Process<br>Knowledge distillation & policy optimization"]
    G["Key Findings<br>Superior performance metrics"]
    H["Outcome<br>Sharpe Ratio: 2.03<br>Highest yield & lowest risk"]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H