false

Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution

Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution ArXiv ID: 2511.15262 “View on arXiv” Authors: Tomas Espana, Yadh Hafsi, Fabrizio Lillo, Edoardo Vittori Abstract We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem. ...

November 19, 2025 · 2 min · Research Team

News-Aware Direct Reinforcement Trading for Financial Markets

News-Aware Direct Reinforcement Trading for Financial Markets ArXiv ID: 2510.19173 “View on arXiv” Authors: Qing-Yu Lan, Zhan-He Wang, Jun-Qian Jiang, Yu-Tong Wang, Yun-Song Piao Abstract The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process. ...

October 22, 2025 · 2 min · Research Team