An Impulse Control Approach to Market Making in a Hawkes LOB Market

ArXiv ID: 2510.26438 “View on arXiv”

Authors: Konark Jain, Nick Firoozye, Jonathan Kochems, Philip Treleaven

Abstract

We study the optimal Market Making problem in a Limit Order Book (LOB) market simulated using a high-fidelity, mutually exciting Hawkes process. Departing from traditional Brownian-driven mid-price models, our setup captures key microstructural properties such as queue dynamics, inter-arrival clustering, and endogenous price impact. Recognizing the realistic constraint that market makers cannot update strategies at every LOB event, we formulate the control problem within an impulse control framework, where interventions occur discretely via limit, cancel, or market orders. This leads to a high-dimensional, non-local Hamilton-Jacobi-Bellman Quasi-Variational Inequality (HJB-QVI), whose solution is analytically intractable and computationally expensive due to the curse of dimensionality. To address this, we propose a novel Reinforcement Learning (RL) approximation inspired by auxiliary control formulations. Using a two-network PPO-based architecture with self-imitation learning, we demonstrate strong empirical performance with limited training, achieving Sharpe ratios above 30 in a realistic simulated LOB. In addition to that, we solve the HJB-QVI using a deep learning method inspired by Sirignano and Spiliopoulos 2018 and compare the performance with the RL agent. Our findings highlight the promise of combining impulse control theory with modern deep RL to tackle optimal execution problems in jump-driven microstructural markets.

Keywords: Hawkes process, impulse control, HJB-QVI, PPO-based architecture, market making, Limit Order Book Markets

Complexity vs Empirical Score

  • Math Complexity: 9.2/10
  • Empirical Rigor: 7.8/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical frameworks including high-dimensional non-local HJB-QVIs and impulse control theory, indicating high mathematical complexity. It demonstrates strong empirical rigor by validating its models through a high-fidelity Hawkes LOB simulator, comparing RL and deep learning solvers, and reporting specific performance metrics like Sharpe ratios.
  flowchart TD
    A["Research Goal<br>Optimal Market Making<br>in Hawkes LOB Market"] --> B["Data & Market Model"]
    B --> C["Problem Formulation<br>Impulse Control HJB-QVI"]
    C --> D{"Computational Approach"}
    D --> E["RL Approximation<br>PPO + Self-Imitation"]
    D --> F["Deep Learning Method<br>HJB Solver"]
    E & F --> G["Performance Comparison<br>Sharpe Ratio > 30"]
    G --> H["Key Findings<br>RL outperforms traditional methods<br>in high-dimensional impulse control"]