false

Deep reinforcement learning for optimal trading with partial information

Deep reinforcement learning for optimal trading with partial information ArXiv ID: 2511.00190 “View on arXiv” Authors: Andrea Macrì, Sebastian Jaimungal, Fabrizio Lillo Abstract Reinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters. The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal’s parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies. ...

October 31, 2025 · 3 min · Research Team

Hierarchical AI Multi-Agent Fundamental Investing: Evidence from China's A-Share Market

Hierarchical AI Multi-Agent Fundamental Investing: Evidence from China’s A-Share Market ArXiv ID: 2510.21147 “View on arXiv” Authors: Chujun He, Zhonghao Huang, Xiangguo Li, Ye Luo, Kewei Ma, Yuxuan Xiong, Xiaowei Zhang, Mingyang Zhao Abstract We present a multi-agent, AI-driven framework for fundamental investing that integrates macro indicators, industry-level and firm-specific information to construct optimized equity portfolios. The architecture comprises: (i) a Macro agent that dynamically screens and weights sectors based on evolving economic indicators and industry performance; (ii) four firm-level agents – Fundamental, Technical, Report, and News – that conduct in-depth analyses of individual firms to ensure both breadth and depth of coverage; (iii) a Portfolio agent that uses reinforcement learning to combine the agent outputs into a unified policy to generate the trading strategy; and (iv) a Risk Control agent that adjusts portfolio positions in response to market volatility. We evaluate the system on the constituents by the CSI 300 Index of China’s A-share market and find that it consistently outperforms standard benchmarks and a state-of-the-art multi-agent trading system on risk-adjusted returns and drawdown control. Our core contribution is a hierarchical multi-agent design that links top-down macro screening with bottom-up fundamental analysis, offering a robust and extensible approach to factor-based portfolio construction. ...

October 24, 2025 · 2 min · Research Team

News-Aware Direct Reinforcement Trading for Financial Markets

News-Aware Direct Reinforcement Trading for Financial Markets ArXiv ID: 2510.19173 “View on arXiv” Authors: Qing-Yu Lan, Zhan-He Wang, Jun-Qian Jiang, Yu-Tong Wang, Yun-Song Piao Abstract The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process. ...

October 22, 2025 · 2 min · Research Team

The Invisible Handshake: Tacit Collusion between Adaptive Market Agents

The Invisible Handshake: Tacit Collusion between Adaptive Market Agents ArXiv ID: 2510.15995 “View on arXiv” Authors: Luigi Foscari, Emanuele Guidotti, Nicolò Cesa-Bianchi, Tatjana Chavdarova, Alfio Ferrara Abstract We study the emergence of tacit collusion between adaptive trading agents in a stochastic market with endogenous price formation. Using a two-player repeated game between a market maker and a market taker, we characterize feasible and collusive strategy profiles that raise prices beyond competitive levels. We show that, when agents follow simple learning algorithms (e.g., gradient ascent) to maximize their own wealth, the resulting dynamics converge to collusive strategy profiles, even in highly liquid markets with small trade sizes. By highlighting how simple learning strategies naturally lead to tacit collusion, our results offer new insights into the dynamics of AI-driven markets. ...

October 14, 2025 · 2 min · Research Team

Integrating Large Language Models and Reinforcement Learning for Sentiment-Driven Quantitative Trading

Integrating Large Language Models and Reinforcement Learning for Sentiment-Driven Quantitative Trading ArXiv ID: 2510.10526 “View on arXiv” Authors: Wo Long, Wenxin Zeng, Xiaoyu Zhang, Ziyao Zhou Abstract This research develops a sentiment-driven quantitative trading system that leverages a large language model, FinGPT, for sentiment analysis, and explores a novel method for signal integration using a reinforcement learning algorithm, Twin Delayed Deep Deterministic Policy Gradient (TD3). We compare the performance of strategies that integrate sentiment and technical signals using both a conventional rule-based approach and a reinforcement learning framework. The results suggest that sentiment signals generated by FinGPT offer value when combined with traditional technical indicators, and that reinforcement learning algorithm presents a promising approach for effectively integrating heterogeneous signals in dynamic trading environments. ...

October 12, 2025 · 2 min · Research Team

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios ArXiv ID: 2510.07099 “View on arXiv” Authors: Himanshu Choudhary, Arishi Orra, Manoj Thakur Abstract In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market behaviour and align with diverse investor preferences. To address this, we propose an innovative framework, termed Diffusion-Augmented Reinforcement Learning (DARL), which synergistically integrates Denoising Diffusion Probabilistic Models (DDPMs) with Deep Reinforcement Learning (DRL) for portfolio management. By leveraging DDPMs to generate synthetic market crash scenarios conditioned on varying stress intensities, our approach significantly enhances the robustness of training data. Empirical evaluations demonstrate that DARL outperforms traditional baselines, delivering superior risk-adjusted returns and resilience against unforeseen crises, such as the 2025 Tariff Crisis. This work offers a robust and practical methodology to bolster stress resilience in DRL-driven financial applications. ...

October 8, 2025 · 2 min · Research Team

From Classical Rationality to Contextual Reasoning: Quantum Logic as a New Frontier for Human-Centric AI in Finance

From Classical Rationality to Contextual Reasoning: Quantum Logic as a New Frontier for Human-Centric AI in Finance ArXiv ID: 2510.05475 “View on arXiv” Authors: Fabio Bagarello, Francesco Gargano, Polina Khrennikova Abstract We consider state of the art applications of artificial intelligence (AI) in modelling human financial expectations and explore the potential of quantum logic to drive future advancements in this field. This analysis highlights the application of machine learning techniques, including reinforcement learning and deep neural networks, in financial statement analysis, algorithmic trading, portfolio management, and robo-advisory services. We further discuss the emergence and progress of quantum machine learning (QML) and advocate for broader exploration of the advantages provided by quantum-inspired neural networks. ...

October 7, 2025 · 2 min · Research Team

FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management

FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management ArXiv ID: 2510.02986 “View on arXiv” Authors: Jian’an Zhang Abstract Transaction costs and regime shifts are major reasons why paper portfolios fail in live trading. We introduce FR-LUX (Friction-aware, Regime-conditioned Learning under eXecution costs), a reinforcement learning framework that learns after-cost trading policies and remains robust across volatility-liquidity regimes. FR-LUX integrates three ingredients: (i) a microstructure-consistent execution model combining proportional and impact costs, directly embedded in the reward; (ii) a trade-space trust region that constrains changes in inventory flow rather than logits, yielding stable low-turnover updates; and (iii) explicit regime conditioning so the policy specializes to LL/LH/HL/HH states without fragmenting the data. On a 4 x 5 grid of regimes and cost levels with multiple random seeds, FR-LUX achieves the top average Sharpe ratio with narrow bootstrap confidence intervals, maintains a flatter cost-performance slope than strong baselines, and attains superior risk-return efficiency for a given turnover budget. Pairwise scenario-level improvements are strictly positive and remain statistically significant after multiple-testing corrections. We provide formal guarantees on optimality under convex frictions, monotonic improvement under a KL trust region, long-run turnover bounds and induced inaction bands due to proportional costs, positive value advantage for regime-conditioned policies, and robustness to cost misspecification. The methodology is implementable: costs are calibrated from standard liquidity proxies, scenario-level inference avoids pseudo-replication, and all figures and tables are reproducible from released artifacts. ...

October 3, 2025 · 2 min · Research Team

Signature-Informed Transformer for Asset Allocation

Signature-Informed Transformer for Asset Allocation ArXiv ID: 2510.03129 “View on arXiv” Authors: Yoontae Hwang, Stefan Zohren Abstract Robust asset allocation is a key challenge in quantitative finance, where deep-learning forecasters often fail due to objective mismatch and error amplification. We introduce the Signature-Informed Transformer (SIT), a novel framework that learns end-to-end allocation policies by directly optimizing a risk-aware financial objective. SIT’s core innovations include path signatures for a rich geometric representation of asset dynamics and a signature-augmented attention mechanism embedding financial inductive biases, like lead-lag effects, into the model. Evaluated on daily S&P 100 equity data, SIT decisively outperforms traditional and deep-learning baselines, especially when compared to predict-then-optimize models. These results indicate that portfolio-aware objectives and geometry-aware inductive biases are essential for risk-aware capital allocation in machine-learning systems. The code is available at: https://github.com/Yoontae6719/Signature-Informed-Transformer-For-Asset-Allocation ...

October 3, 2025 · 2 min · Research Team

Adaptive and Regime-Aware RL for Portfolio Optimization

Adaptive and Regime-Aware RL for Portfolio Optimization ArXiv ID: 2509.14385 “View on arXiv” Authors: Gabriel Nixon Raj Abstract This study proposes a regime-aware reinforcement learning framework for long-horizon portfolio optimization. Moving beyond traditional feedforward and GARCH-based models, we design realistic environments where agents dynamically reallocate capital in response to latent macroeconomic regime shifts. Agents receive hybrid observations and are trained using constrained reward functions that incorporate volatility penalties, capital resets, and tail-risk shocks. We benchmark multiple architectures, including PPO, LSTM-based PPO, and Transformer PPO, against classical baselines such as equal-weight and Sharpe-optimized portfolios. Our agents demonstrate robust performance under financial stress. While Transformer PPO achieves the highest risk-adjusted returns, LSTM variants offer a favorable trade-off between interpretability and training cost. The framework promotes regime-adaptive, explainable reinforcement learning for dynamic asset allocation. ...

September 17, 2025 · 2 min · Research Team