Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity

ArXiv ID: 2407.09557 “View on arXiv”

Authors: Unknown

Abstract

Recent deep reinforcement learning (DRL) methods in finance show promising outcomes. However, there is limited research examining the behavior of these DRL algorithms. This paper aims to investigate their tendencies towards holding or trading financial assets as well as purchase diversity. By analyzing their trading behaviors, we provide insights into the decision-making processes of DRL models in finance applications. Our findings reveal that each DRL algorithm exhibits unique trading patterns and strategies, with A2C emerging as the top performer in terms of cumulative rewards. While PPO and SAC engage in significant trades with a limited number of stocks, DDPG and TD3 adopt a more balanced approach. Furthermore, SAC and PPO tend to hold positions for shorter durations, whereas DDPG, A2C, and TD3 display a propensity to remain stationary for extended periods.

Keywords: Deep Reinforcement Learning (DRL), Trading Behavior, A2C, PPO, Financial Assets

Complexity vs Empirical Score

  • Math Complexity: 5.0/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs deep reinforcement learning algorithms like A2C, PPO, SAC, DDPG, and TD3, which involve stochastic optimization and neural network architectures, adding mathematical complexity, while its empirical rigor is high due to the use of a specific dataset (Dow Jones 30 companies from Yahoo Finance), defined train/test splits, and analysis of trading behaviors.
  flowchart TD
    A["Research Goal<br>Investigate DRL Trading Behavior<br>& Purchase Diversity"] --> B{"Methodology"}
    B --> C["Data: Financial Asset Time Series"]
    B --> D["Compute: Train DRL Models<br>DDPG, TD3, A2C, PPO, SAC"]
    C --> D
    D --> E{"Analyze Trading Behavior"}
    E --> F["Findings & Outcomes"]
    F --> G["Performance<br>A2C: Highest Cumulative Reward"]
    F --> H["Trade Volume<br>PPO/SAC: High Trades, Few Stocks<br>DDPG/TD3: Balanced Approach"]
    F --> I["Holding Time<br>SAC/PPO: Short Duration<br>DDPG/A2C/TD3: Long Duration"]