Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor–Critic and Deep Deterministic Policy Gradient Algorithms
ArXiv ID: 2511.20678 “View on arXiv”
Authors: Kamal Paykan
Abstract
This paper proposes a reinforcement learning–based framework for cryptocurrency portfolio management using the Soft Actor–Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean–variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets.
Keywords: Reinforcement Learning, Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Cryptocurrency portfolio management, Portfolio optimization, Cryptocurrency
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced mathematical concepts like actor-critic architectures, entropy regularization, and stochastic control, while also demonstrating strong empirical rigor through backtesting on real cryptocurrency data from 2016-2024 with multiple risk-adjusted metrics.
flowchart TD
A["Research Goal: Apply RL to Cryptocurrency Portfolio Management"] --> B["Methodology: Data Simulation & Environment"]
B --> C["Input: Historical Market Data<br>(BTC, ETH, XRP, etc.)"]
C --> D["Training Agents: SAC & DDPG Algorithms"]
D --> E["Process: Learning Continuous Trading Actions<br>to Maximize Returns & Minimize Risk"]
E --> F["Key Findings & Outcomes"]
F --> G["SAC & DDPG Outperform Baselines<br>(Equal-weight, Mean-Variance)"]
F --> H["SAC shows superior stability<br>via Entropy Regularization in volatile markets"]