Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

ArXiv ID: 2511.12120 “View on arXiv”

Authors: Hongyang Yang, Xiao-Yang Liu, Shan Zhong, Anwar Walid

Abstract

Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{“https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020"}{"GitHub"}.

Keywords: Deep Reinforcement Learning ensemble, Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Load-on-demand data processing, Equities (Dow Jones)

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper uses advanced deep reinforcement learning algorithms (PPO, A2C, DDPG) and MDP formulation, indicating high mathematical complexity, and is backtest-ready with specific data (30 Dow Jones stocks), open-source code, and risk-adjusted metrics like Sharpe ratio.
  flowchart TD
    A["Research Goal: Create Profitable<br>Stock Trading Strategy"] --> B["Data: 30 Dow Jones Stocks<br>Load-on-Demand Processing"]
    B --> C["Method: Deep Reinforcement Learning<br>Ensemble of 3 Actor-Critic Algorithms"]
    C --> D{"Computational Process: Train Agents"}
    D --> E["PPO"]
    D --> F["A2C"]
    D --> G["DDPG"]
    E & F & G --> H["Key Outcome: Ensemble Strategy<br>Best Features Integrated"]
    H --> I["Result: Outperformed<br>Baselines & Individual Agents<br>(Highest Sharpe Ratio)"]