Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics

ArXiv ID: 2509.12456 “View on arXiv”

Authors: Rafael Zimmer, Oswaldo Luiz do Valle Costa

Abstract

Reinforcement Learning has emerged as a promising framework for developing adaptive and data-driven strategies, enabling market makers to optimize decision-making policies based on interactions with the limit order book environment. This paper explores the integration of a reinforcement learning agent in a market-making context, where the underlying market dynamics have been explicitly modeled to capture observed stylized facts of real markets, including clustered order arrival times, non-stationary spreads and return drifts, stochastic order quantities and price volatility. These mechanisms aim to enhance stability of the resulting control agent, and serve to incorporate domain-specific knowledge into the agent policy learning process. Our contributions include a practical implementation of a market making agent based on the Proximal-Policy Optimization (PPO) algorithm, alongside a comparative evaluation of the agent’s performance under varying market conditions via a simulator-based environment. As evidenced by our analysis of the financial return and risk metrics when compared to a closed-form optimal solution, our results suggest that the reinforcement learning agent can effectively be used under non-stationary market conditions, and that the proposed simulator-based environment can serve as a valuable tool for training and pre-training reinforcement learning agents in market-making scenarios.

Keywords: reinforcement learning, market making, Proximal-Policy Optimization (PPO), limit order book, simulator-based environment, General Equities

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced stochastic control and Markov Decision Process formalism (high math) and implements a practical PPO agent in a custom simulator with financial risk metrics (high rigor).
  flowchart TD
    A["Research Goal:<br>Adaptive Market Making<br>via RL in Non-Stationary LOBs"] --> B["Methodology:<br>Proximal-Policy Optimization (PPO) Agent"]
    B --> C["Simulator Environment<br>Inputs: Stylized Facts<br>(Clustered arrivals, Volatility, Drifts)"]
    C --> D["Computational Process:<br>Agent-Environment Interaction Loop"]
    D --> E["Comparative Evaluation:<br>RL Agent vs. Closed-Form Optimal Solution"]
    E --> F["Key Findings:<br>Effective Non-Stationary Adaptation<br>Validated Simulator for Pre-training"]