Reinforcement Learning in Non-Markov Market-Making
ArXiv ID: 2410.14504 “View on arXiv”
Authors: Unknown
Abstract
We develop a deep reinforcement learning (RL) framework for an optimal market-making (MM) trading problem, specifically focusing on price processes with semi-Markov and Hawkes Jump-Diffusion dynamics. We begin by discussing the basics of RL and the deep RL framework used, where we deployed the state-of-the-art Soft Actor-Critic (SAC) algorithm for the deep learning part. The SAC algorithm is an off-policy entropy maximization algorithm more suitable for tackling complex, high-dimensional problems with continuous state and action spaces like in optimal market-making (MM). We introduce the optimal MM problem considered, where we detail all the deterministic and stochastic processes that go into setting up an environment for simulating this strategy. Here we also give an in-depth overview of the jump-diffusion pricing dynamics used, our method for dealing with adverse selection within the limit order book, and we highlight the working parts of our optimization problem. Next, we discuss training and testing results, where we give visuals of how important deterministic and stochastic processes such as the bid/ask, trade executions, inventory, and the reward function evolved. We include a discussion on the limitations of these results, which are important points to note for most diffusion models in this setting.
Keywords: Market Making, Soft Actor-Critic (SAC), Hawkes Jump-Diffusion, Limit Order Book, Reinforcement Learning, Equities / High-Frequency Trading
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 4.0/10
- Quadrant: Lab Rats
- Why: The paper employs advanced stochastic calculus and deep reinforcement learning theory, including complex non-Markovian processes like Hawkes jump-diffusions, which drives a high math score. However, the empirical section primarily discusses simulated training and testing results with noted limitations, lacking the backtest-ready, real-market implementation details or statistical metrics typical of high-rigor work.
flowchart TD
A["Research Goal:<br>Optimal Market-Making<br>in Non-Markov Dynamics"] --> B["Methodology:<br>Deep RL using<br>Soft Actor-Critic SAC"]
B --> C["Input Data:<br>Hawkes Jump-Diffusion<br>Limit Order Book"]
C --> D["Computational Process:<br>Training SAC on<br>Stochastic Environment"]
D --> E["Key Findings:<br>Optimal Inventory &<br>Spread Management<br>Limitations Identified"]