Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution

ArXiv ID: 2410.14927 “View on arXiv”

Authors: Unknown

Abstract

Leveraging Deep Reinforcement Learning (DRL) in automated stock trading has shown promising results, yet its application faces significant challenges, including the curse of dimensionality, inertia in trading actions, and insufficient portfolio diversification. Addressing these challenges, we introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework. The HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value. In our empirical analysis, comparing the HRT agent with standalone DRL models and the S&P 500 benchmark during both bullish and bearish market conditions, we achieve a positive and higher Sharpe ratio. This advancement not only underscores the efficacy of incorporating hierarchical structures into DRL strategies but also mitigates the aforementioned challenges, paving the way for designing more profitable and robust trading algorithms in complex markets.

Keywords: Deep Reinforcement Learning (DRL), Hierarchical Reinforcement Learning, Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Automated Stock Trading, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 6.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematics from deep reinforcement learning (PPO, DDPG) and hierarchical MDP formulations, indicating high complexity. It presents a backtest-ready empirical analysis with Sharpe ratios, comparisons to benchmarks, and visualizations on S&P 500 data, though real-world implementation details like slippage and latency are not deeply explored.
  flowchart TD
    A["Research Goal<br>Develop robust automated trading system<br>addressing dimensionality, inertia, and diversification"] --> B["Methodology: Bi-Level HRT Architecture"]
    B --> C["Data Inputs<br>Historical Stock Market Data"]
    C --> D["High-Level Controller<br>PPO Agent: Strategic Stock Selection"]
    D --> E["Low-Level Controller<br>DDPG Agent: Execution Optimization"]
    E --> F["Computational Process<br>Bi-Level HRL Integration"]
    F --> G["Key Outcomes"]
    G --> H["Superior Sharpe Ratio vs. Benchmarks"]
    G --> I["Effective Mitigation of DRL Challenges"]