CAD: Clustering And Deep Reinforcement Learning Based Multi-Period Portfolio Management Strategy
ArXiv ID: 2310.01319 “View on arXiv”
Authors: Unknown
Abstract
In this paper, we present a novel trading strategy that integrates reinforcement learning methods with clustering techniques for portfolio management in multi-period trading. Specifically, we leverage the clustering method to categorize stocks into various clusters based on their financial indices. Subsequently, we utilize the algorithm Asynchronous Advantage Actor-Critic to determine the trading actions for stocks within each cluster. Finally, we employ the algorithm DDPG to generate the portfolio weight vector, which decides the amount of stocks to buy, sell, or hold according to the trading actions of different clusters. To the best of our knowledge, our approach is the first to combine clustering methods and reinforcement learning methods for portfolio management in the context of multi-period trading. Our proposed strategy is evaluated using a series of back-tests on four datasets, comprising a of 800 stocks, obtained from the Shanghai Stock Exchange and National Association of Securities Deal Automated Quotations sources. Our results demonstrate that our approach outperforms conventional portfolio management techniques, such as the Robust Median Reversion strategy, Passive Aggressive Median Reversion Strategy, and several machine learning methods, across various metrics. In our back-test experiments, our proposed strategy yields an average return of 151% over 360 trading periods with 800 stocks, compared to the highest return of 124% achieved by other techniques over identical trading periods and stocks.
Keywords: Reinforcement Learning, Clustering, Asynchronous Advantage Actor-Critic (A3C), Deep Deterministic Policy Gradient (DDPG), Multi-Period Trading, Equities
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper integrates advanced reinforcement learning algorithms (A3C, DDPG) with clustering, involving complex state/action spaces and optimization. It demonstrates high empirical rigor through backtesting on real datasets of 800 stocks across multiple periods with detailed metrics and comparisons against established baselines.
flowchart TD
A["Research Goal<br>Develop a Novel Multi-Period<br>Portfolio Management Strategy"] --> B["Data Preparation<br>800 Stocks (SSE/NASDAQ)"]
B --> C["Clustering Stage<br>Categorize stocks by<br>financial indices using K-Means"]
C --> D["Trading Decision Stage<br>Use A3C Algorithm to<br>determine Buy/Sell/Hold actions"]
D --> E["Weight Allocation Stage<br>Use DDPG Algorithm to<br>generate Portfolio Weight Vector"]
E --> F["Back-Testing Evaluation<br>Run strategy over 360 trading periods"]
F --> G["Key Findings<br>Achieved 151% Average Return<br>Outperformed Baseline Strategies<br>(e.g., RMR, PAMR)"]