Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization
ArXiv ID: 2511.17963 “View on arXiv”
Authors: Jun Kevin, Pujianto Yugopuspito
Abstract
This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework’s performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization.
Keywords: Long Short-Term Memory (LSTM), Proximal Policy Optimization (PPO), Reinforcement Learning, Deep Recurrent Networks, Portfolio Optimization, Portfolio
Complexity vs Empirical Score
- Math Complexity: 8.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Holy Grail
- Why: The paper presents a sophisticated hybrid model combining LSTM and PPO with multiple derivations and complex loss functions, while also detailing a multi-asset dataset, specific backtesting metrics, and transaction cost adjustments, indicating strong empirical implementation.
flowchart TD
A["Research Goal: Dynamic Portfolio Optimization"] --> B["Hybrid LSTM-PPO Framework"]
B --> C["Data Input<br>Multi-Asset (Jan 2018 - Dec 2024)<br>US/Indonesia Equities, Treasuries, Crypto"]
C --> D["Process: LSTM Forecasting<br>Deep Recurrent Networks<br>Temporal Trend Prediction"]
C --> E["Process: PPO Agent<br>Proximal Policy Optimization<br>Continuous Action Space Allocation"]
D --> F["Fusion & Adaptation<br>Dynamic Rebalancing + Transaction Cost Adjustment"]
E --> F
F --> G["Benchmarking<br>vs. Equal-weight, Index, LSTM-only, PPO-only"]
G --> H["Key Outcomes:<br>↑ Higher Annualized Returns<br>↑ Sharpe Ratio<br>↓ Max Drawdown<br>↑ Market Regime Resilience"]