false

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization ArXiv ID: 2511.17963 “View on arXiv” Authors: Jun Kevin, Pujianto Yugopuspito Abstract This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework’s performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization. ...

November 22, 2025 · 2 min · Research Team

Law-Strength Frontiers and a No-Free-Lunch Result for Law-Seeking Reinforcement Learning on Volatility Law Manifolds

Law-Strength Frontiers and a No-Free-Lunch Result for Law-Seeking Reinforcement Learning on Volatility Law Manifolds ArXiv ID: 2511.17304 “View on arXiv” Authors: Jian’an Zhang Abstract We study reinforcement learning (RL) on volatility surfaces through the lens of Scientific AI. We ask whether axiomatic no-arbitrage laws, imposed as soft penalties on a learned world model, can reliably align high-capacity RL agents, or mainly create Goodhart-style incentives to exploit model errors. From classical static no-arbitrage conditions we build a finite-dimensional convex volatility law manifold of admissible total-variance surfaces, together with a metric law-penalty functional and a Graceful Failure Index (GFI) that normalizes law degradation under shocks. A synthetic generator produces law-consistent trajectories, while a recurrent neural world model trained without law regularization exhibits structured off-manifold errors. On this testbed we define a Goodhart decomposition (r = r^{"\mathcal{M"}} + r^\perp), where (r^\perp) is ghost arbitrage from off-manifold prediction error. We prove a ghost-arbitrage incentive theorem for PPO-type agents, a law-strength trade-off theorem showing that stronger penalties eventually worsen P&L, and a no-free-lunch theorem: under a law-consistent world model and law-aligned strategy class, unconstrained law-seeking RL cannot Pareto-dominate structural baselines on P&L, penalties, and GFI. In experiments on an SPX/VIX-like world model, simple structural strategies form the empirical law-strength frontier, while all law-seeking RL variants underperform and move into high-penalty, high-GFI regions. Volatility thus provides a concrete case where reward shaping with verifiable penalties is insufficient for robust law alignment. ...

November 21, 2025 · 2 min · Research Team

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy ArXiv ID: 2511.12120 “View on arXiv” Authors: Hongyang Yang, Xiao-Yang Liu, Shan Zhong, Anwar Walid Abstract Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{“https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020"}{"GitHub"}. ...

November 15, 2025 · 2 min · Research Team

Risk-Aware Deep Reinforcement Learning for Dynamic Portfolio Optimization

Risk-Aware Deep Reinforcement Learning for Dynamic Portfolio Optimization ArXiv ID: 2511.11481 “View on arXiv” Authors: Emmanuel Lwele, Sabuni Emmanuel, Sitali Gabriel Sitali Abstract This paper presents a deep reinforcement learning (DRL) framework for dynamic portfolio optimization under market uncertainty and risk. The proposed model integrates a Sharpe ratio-based reward function with direct risk control mechanisms, including maximum drawdown and volatility constraints. Proximal Policy Optimization (PPO) is employed to learn adaptive asset allocation strategies over historical financial time series. Model performance is benchmarked against mean-variance and equal-weight portfolio strategies using backtesting on high-performing equities. Results indicate that the DRL agent stabilizes volatility successfully but suffers from degraded risk-adjusted returns due to over-conservative policy convergence, highlighting the challenge of balancing exploration, return maximization, and risk mitigation. The study underscores the need for improved reward shaping and hybrid risk-aware strategies to enhance the practical deployment of DRL-based portfolio allocation models. ...

November 14, 2025 · 2 min · Research Team

Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics

Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics ArXiv ID: 2509.12456 “View on arXiv” Authors: Rafael Zimmer, Oswaldo Luiz do Valle Costa Abstract Reinforcement Learning has emerged as a promising framework for developing adaptive and data-driven strategies, enabling market makers to optimize decision-making policies based on interactions with the limit order book environment. This paper explores the integration of a reinforcement learning agent in a market-making context, where the underlying market dynamics have been explicitly modeled to capture observed stylized facts of real markets, including clustered order arrival times, non-stationary spreads and return drifts, stochastic order quantities and price volatility. These mechanisms aim to enhance stability of the resulting control agent, and serve to incorporate domain-specific knowledge into the agent policy learning process. Our contributions include a practical implementation of a market making agent based on the Proximal-Policy Optimization (PPO) algorithm, alongside a comparative evaluation of the agent’s performance under varying market conditions via a simulator-based environment. As evidenced by our analysis of the financial return and risk metrics when compared to a closed-form optimal solution, our results suggest that the reinforcement learning agent can effectively be used under non-stationary market conditions, and that the proposed simulator-based environment can serve as a valuable tool for training and pre-training reinforcement learning agents in market-making scenarios. ...

September 15, 2025 · 2 min · Research Team

DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection

DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection ArXiv ID: 2510.14985 “View on arXiv” Authors: Jinkyu Kim, Hyunjung Yi, Mogan Gim, Donghee Choi, Jaewoo Kang Abstract We propose DeepAries , a novel deep reinforcement learning framework for dynamic portfolio management that jointly optimizes the timing and allocation of rebalancing decisions. Unlike prior reinforcement learning methods that employ fixed rebalancing intervals regardless of market conditions, DeepAries adaptively selects optimal rebalancing intervals along with portfolio weights to reduce unnecessary transaction costs and maximize risk-adjusted returns. Our framework integrates a Transformer-based state encoder, which effectively captures complex long-term market dependencies, with Proximal Policy Optimization (PPO) to generate simultaneous discrete (rebalancing intervals) and continuous (asset allocations) actions. Extensive experiments on multiple real-world financial markets demonstrate that DeepAries significantly outperforms traditional fixed-frequency and full-rebalancing strategies in terms of risk-adjusted returns, transaction costs, and drawdowns. Additionally, we provide a live demo of DeepAries at https://deep-aries.github.io/, along with the source code and dataset at https://github.com/dmis-lab/DeepAries, illustrating DeepAries’ capability to produce interpretable rebalancing and allocation decisions aligned with shifting market regimes. Overall, DeepAries introduces an innovative paradigm for adaptive and practical portfolio management by integrating both timing and allocation into a unified decision-making process. ...

September 11, 2025 · 2 min · Research Team

Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading

Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading ArXiv ID: 2509.01393 “View on arXiv” Authors: Qizhao Chen, Hiroaki Kawashima Abstract This paper proposes a reinforcement learning framework that employs Proximal Policy Optimization (PPO) to dynamically optimize the weights of multiple large language model (LLM)-generated formulaic alphas for stock trading strategies. Formulaic alphas are mathematically defined trading signals derived from price, volume, sentiment, and other data. Although recent studies have shown that LLMs can generate diverse and effective alphas, a critical challenge lies in how to adaptively integrate them under varying market conditions. To address this gap, we leverage the deepseek-r1-distill-llama-70b model to generate fifty alphas for five major stocks: Apple, HSBC, Pepsi, Toyota, and Tencent, and then use PPO to adjust their weights in real time. Experimental results demonstrate that the PPO-optimized strategy achieves strong returns and high Sharpe ratios across most stocks, outperforming both an equal-weighted alpha portfolio and traditional benchmarks such as the Nikkei 225, S&P 500, and Hang Seng Index. The findings highlight the importance of reinforcement learning in the allocation of alpha weights and show the potential of combining LLM-generated signals with adaptive optimization for robust financial forecasting and trading. ...

September 1, 2025 · 2 min · Research Team

Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market

Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market ArXiv ID: 2506.20930 “View on arXiv” Authors: Chi-Sheng Chen, Xinyu Zhang, Ya-Chuan Chen Abstract We propose a hybrid quantum-classical reinforcement learning framework for sector rotation in the Taiwan stock market. Our system employs Proximal Policy Optimization (PPO) as the backbone algorithm and integrates both classical architectures (LSTM, Transformer) and quantum-enhanced models (QNN, QRWKV, QASA) as policy and value networks. An automated feature engineering pipeline extracts financial indicators from capital share data to ensure consistent model input across all configurations. Empirical backtesting reveals a key finding: although quantum-enhanced models consistently achieve higher training rewards, they underperform classical models in real-world investment metrics such as cumulative return and Sharpe ratio. This discrepancy highlights a core challenge in applying reinforcement learning to financial domains – namely, the mismatch between proxy reward signals and true investment objectives. Our analysis suggests that current reward designs may incentivize overfitting to short-term volatility rather than optimizing risk-adjusted returns. This issue is compounded by the inherent expressiveness and optimization instability of quantum circuits under Noisy Intermediate-Scale Quantum (NISQ) constraints. We discuss the implications of this reward-performance gap and propose directions for future improvement, including reward shaping, model regularization, and validation-based early stopping. Our work offers a reproducible benchmark and critical insights into the practical challenges of deploying quantum reinforcement learning in real-world finance. ...

June 26, 2025 · 2 min · Research Team

Can Artificial Intelligence Trade the Stock Market?

Can Artificial Intelligence Trade the Stock Market? ArXiv ID: 2506.04658 “View on arXiv” Authors: Jędrzej Maskiewicz, Paweł Sakowski Abstract The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL’s effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns. ...

June 5, 2025 · 2 min · Research Team

Deep Reinforcement Learning Algorithms for Option Hedging

Deep Reinforcement Learning Algorithms for Option Hedging ArXiv ID: 2504.05521 “View on arXiv” Authors: Unknown Abstract Dynamic hedging is a financial strategy that consists in periodically transacting one or multiple financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of only one or two DRL algorithms, making an objective comparison across algorithms difficult. In this paper, we compare the performance of eight DRL algorithms in the context of dynamic hedging; Monte Carlo Policy Gradient (MCPG), Proximal Policy Optimization (PPO), along with four variants of Deep Q-Learning (DQL) and two variants of Deep Deterministic Policy Gradient (DDPG). Two of these variants represent a novel application to the task of dynamic hedging. In our experiments, we use the Black-Scholes delta hedge as a baseline and simulate the dataset using a GJR-GARCH(1,1) model. Results show that MCPG, followed by PPO, obtain the best performance in terms of the root semi-quadratic penalty. Moreover, MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget, possibly due to the sparsity of rewards in our environment. ...

April 7, 2025 · 2 min · Research Team