Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution
ArXiv ID: 2511.15262 “View on arXiv”
Authors: Tomas Espana, Yadh Hafsi, Fabrizio Lillo, Edoardo Vittori
Abstract
We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.
Keywords: Reinforcement Learning, Optimal execution, Double Deep Q-Network, Queue-Reactive Model, Limit order book simulation, Equity
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced mathematical concepts like stochastic control theory, HJB equations, viscosity solutions, and transient impact models with arbitrage constraints, indicating high mathematical complexity. It also demonstrates strong empirical rigor through the use of a calibrated Queue-Reactive Model simulator for training and evaluation, a standard RL architecture (DDQN), and comparative benchmarking in simulated environments, making it backtest-ready.
flowchart TD
Start["Research Goal:<br>Optimal Execution via RL"] --> Inputs["Key Inputs:<br>Limit Order Book Data"]
Inputs --> Model["Methodology:<br>Queue-Reactive Model Simulation"]
Model --> StateSpace["State Space Construction:<br>Time, Inventory, Price, Depth"]
StateSpace --> DQN["Computational Process:<br>Train Double Deep Q-Network"]
DQN --> Findings["Key Findings:<br>Adaptive Policy & Outperforms Benchmarks"]