ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book
ArXiv ID: 2511.02016 “View on arXiv”
Authors: Patrick Cheridito, Jean-Loup Dupret, Zhexin Wu
Abstract
We present ABIDES-MARL, a framework that combines a new multi-agent reinforcement learning (MARL) methodology with a new realistic limit-order-book (LOB) simulation system to study equilibrium behavior in complex financial market games. The system extends ABIDES-Gym by decoupling state collection from kernel interruption, enabling synchronized learning and decision-making for multiple adaptive agents while maintaining compatibility with standard RL libraries. It preserves key market features such as price-time priority and discrete tick sizes. Methodologically, we use MARL to approximate equilibrium-like behavior in multi-period trading games with a finite number of heterogeneous agents-an informed trader, a liquidity trader, noise traders, and competing market makers-all with individual price impacts. This setting bridges optimal execution and market microstructure by embedding the liquidity trader’s optimization problem within a strategic trading environment. We validate the approach by solving an extended Kyle model within the simulation system, recovering the gradual price discovery phenomenon. We then extend the analysis to a liquidity trader’s problem where market liquidity arises endogenously and show that, at equilibrium, execution strategies shape market-maker behavior and price dynamics. ABIDES-MARL provides a reproducible foundation for analyzing equilibrium and strategic adaptation in realistic markets and contributes toward building economically interpretable agentic AI systems for finance.
Keywords: ABIDES, multi-agent reinforcement learning, limit order book, Kyle model, market microstructure
Complexity vs Empirical Score
- Math Complexity: 7.0/10
- Empirical Rigor: 6.0/10
- Quadrant: Holy Grail
- Why: The paper introduces a complex multi-agent reinforcement learning framework and connects it to theoretical models like Kyle’s model, which involves substantial mathematical modeling and equilibrium concepts. However, while it presents a simulation system (ABIDES-MARL) with backtest-ready features like LOB simulation and agents, the empirical validation is primarily theoretical (recovering known models) rather than data-heavy backtesting on real market data.
flowchart TD
A["Research Goal<br>Model equilibrium behavior in<br>complex financial market games"] --> B["Methodology: MARL Framework"]
B --> C["Core Components<br>Limit Order Book Simulation &<br>Synchronized Multi-Agent RL"]
C --> D["Key Processes"]
D --> E["Solve Extended Kyle Model"]
D --> F["Simulate Endogenous Liquidity"]
E --> G["Findings & Outcomes"]
F --> G
G --> H["Recovered Gradual Price Discovery"]
G --> I["Equilibrium Strategies Shape<br>Market-Maker Behavior & Price Dynamics"]