false

Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms

Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor–Critic and Deep Deterministic Policy Gradient Algorithms ArXiv ID: 2511.20678 “View on arXiv” Authors: Kamal Paykan Abstract This paper proposes a reinforcement learning–based framework for cryptocurrency portfolio management using the Soft Actor–Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean–variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets. ...

November 16, 2025 · 2 min · Research Team

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy ArXiv ID: 2511.12120 “View on arXiv” Authors: Hongyang Yang, Xiao-Yang Liu, Shan Zhong, Anwar Walid Abstract Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{“https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020"}{"GitHub"}. ...

November 15, 2025 · 2 min · Research Team

Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution

Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution ArXiv ID: 2410.14927 “View on arXiv” Authors: Unknown Abstract Leveraging Deep Reinforcement Learning (DRL) in automated stock trading has shown promising results, yet its application faces significant challenges, including the curse of dimensionality, inertia in trading actions, and insufficient portfolio diversification. Addressing these challenges, we introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework. The HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value. In our empirical analysis, comparing the HRT agent with standalone DRL models and the S&P 500 benchmark during both bullish and bearish market conditions, we achieve a positive and higher Sharpe ratio. This advancement not only underscores the efficacy of incorporating hierarchical structures into DRL strategies but also mitigates the aforementioned challenges, paving the way for designing more profitable and robust trading algorithms in complex markets. ...

October 19, 2024 · 2 min · Research Team

Deeper Hedging: A New Agent-based Model for Effective Deep Hedging

Deeper Hedging: A New Agent-based Model for Effective Deep Hedging ArXiv ID: 2310.18755 “View on arXiv” Authors: Unknown Abstract We propose the Chiarella-Heston model, a new agent-based model for improving the effectiveness of deep hedging strategies. This model includes momentum traders, fundamental traders, and volatility traders. The volatility traders participate in the market by innovatively following a Heston-style volatility signal. The proposed model generalises both the extended Chiarella model and the Heston stochastic volatility model, and is calibrated to reproduce as many empirical stylized facts as possible. According to the stylised facts distance metric, the proposed model is able to reproduce more realistic financial time series than three baseline models: the extended Chiarella model, the Heston model, and the Geometric Brownian Motion. The proposed model is further validated by the Generalized Subtracted L-divergence metric. With the proposed Chiarella-Heston model, we generate a training dataset to train a deep hedging agent for optimal hedging strategies under various transaction cost levels. The deep hedging agent employs the Deep Deterministic Policy Gradient algorithm and is trained to maximize profits and minimize risks. Our testing results reveal that the deep hedging agent, trained with data generated by our proposed model, outperforms the baseline in most transaction cost levels. Furthermore, the testing process, which is conducted using empirical data, demonstrates the effective performance of the trained deep hedging agent in a realistic trading environment. ...

October 28, 2023 · 2 min · Research Team

CAD: Clustering And Deep Reinforcement Learning Based Multi-Period Portfolio Management Strategy

CAD: Clustering And Deep Reinforcement Learning Based Multi-Period Portfolio Management Strategy ArXiv ID: 2310.01319 “View on arXiv” Authors: Unknown Abstract In this paper, we present a novel trading strategy that integrates reinforcement learning methods with clustering techniques for portfolio management in multi-period trading. Specifically, we leverage the clustering method to categorize stocks into various clusters based on their financial indices. Subsequently, we utilize the algorithm Asynchronous Advantage Actor-Critic to determine the trading actions for stocks within each cluster. Finally, we employ the algorithm DDPG to generate the portfolio weight vector, which decides the amount of stocks to buy, sell, or hold according to the trading actions of different clusters. To the best of our knowledge, our approach is the first to combine clustering methods and reinforcement learning methods for portfolio management in the context of multi-period trading. Our proposed strategy is evaluated using a series of back-tests on four datasets, comprising a of 800 stocks, obtained from the Shanghai Stock Exchange and National Association of Securities Deal Automated Quotations sources. Our results demonstrate that our approach outperforms conventional portfolio management techniques, such as the Robust Median Reversion strategy, Passive Aggressive Median Reversion Strategy, and several machine learning methods, across various metrics. In our back-test experiments, our proposed strategy yields an average return of 151% over 360 trading periods with 800 stocks, compared to the highest return of 124% achieved by other techniques over identical trading periods and stocks. ...

October 2, 2023 · 2 min · Research Team