Deep Reinforcement Learning for Optimal Asset Allocation Using DDPG with TiDE

ArXiv ID: 2508.20103 “View on arXiv”

Authors: Rongwei Liu, Jin Zheng, John Cartlidge

Abstract

The optimal asset allocation between risky and risk-free assets is a persistent challenge due to the inherent volatility in financial markets. Conventional methods rely on strict distributional assumptions or non-additive reward ratios, which limit their robustness and applicability to investment goals. To overcome these constraints, this study formulates the optimal two-asset allocation problem as a sequential decision-making task within a Markov Decision Process (MDP). This framework enables the application of reinforcement learning (RL) mechanisms to develop dynamic policies based on simulated financial scenarios, regardless of prerequisites. We use the Kelly criterion to balance immediate reward signals against long-term investment objectives, and we take the novel step of integrating the Time-series Dense Encoder (TiDE) into the Deep Deterministic Policy Gradient (DDPG) RL framework for continuous decision-making. We compare DDPG-TiDE with a simple discrete-action Q-learning RL framework and a passive buy-and-hold investment strategy. Empirical results show that DDPG-TiDE outperforms Q-learning and generates higher risk adjusted returns than buy-and-hold. These findings suggest that tackling the optimal asset allocation problem by integrating TiDE within a DDPG reinforcement learning framework is a fruitful avenue for further exploration.

Keywords: asset allocation, reinforcement learning, Markov Decision Process, Kelly criterion, DDPG, Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 3.0/10
  • Quadrant: Lab Rats
  • Why: The paper employs advanced mathematical concepts such as Markov Decision Processes, the Kelly criterion, and the Deep Deterministic Policy Gradient (DDPG) algorithm, requiring a dense understanding of reinforcement learning and stochastic optimization. However, its empirical rigor is limited as it relies on simulated financial scenarios without clear evidence of real-world data, backtesting protocols, or implementation details, making it more theoretical than immediately deployable.
  flowchart TD
    A["Research Goal: Optimal Asset Allocation<br>between Risky & Risk-Free Assets"] --> B["Formulate as MDP & Kelly Criterion"]
    B --> C["Modeling DDPG-TiDE Framework<br>Continuous Action RL with Time-series Encoder"]
    B --> D["Baselines: Q-Learning &<br>Passive Buy-and-Hold"]
    C & D --> E["Simulation & Training on<br>Financial Market Data"]
    E --> F["Performance Evaluation<br>Risk-Adjusted Returns"]
    F --> G["Outcome: DDPG-TiDE Outperforms<br>Baselines (Higher Risk-Adjusted Returns)"]