Reinforcement Learning in High-frequency Market Making

ArXiv ID: 2407.21025 “View on arXiv”

Authors: Unknown

Abstract

This paper establishes a new and comprehensive theoretical analysis for the application of reinforcement learning (RL) in high-frequency market making. We bridge the modern RL theory and the continuous-time statistical models in high-frequency financial economics. Different with most existing literature on methodological research about developing various RL methods for market making problem, our work is a pilot to provide the theoretical analysis. We target the effects of sampling frequency, and find an interesting tradeoff between error and complexity of RL algorithm when tweaking the values of the time increment $Δ$ $-$ as $Δ$ becomes smaller, the error will be smaller but the complexity will be larger. We also study the two-player case under the general-sum game framework and establish the convergence of Nash equilibrium to the continuous-time game equilibrium as $Δ\rightarrow0$. The Nash Q-learning algorithm, which is an online multi-agent RL method, is applied to solve the equilibrium. Our theories are not only useful for practitioners to choose the sampling frequency, but also very general and applicable to other high-frequency financial decision making problems, e.g., optimal executions, as long as the time-discretization of a continuous-time markov decision process is adopted. Monte Carlo simulation evidence support all of our theories.

Keywords: High-Frequency Market Making, Reinforcement Learning Theory, Nash Q-Learning, Continuous-Time Models, Game Theory, Equities

Complexity vs Empirical Score

  • Math Complexity: 9.0/10
  • Empirical Rigor: 6.0/10
  • Quadrant: Holy Grail
  • Why: The paper is highly theoretical with dense mathematical analysis of continuous-time models, convergence, and sample complexity, yet it includes Monte Carlo simulations to support the theories, indicating moderate empirical rigor.
  flowchart TD
    A["Research Goal<br/>Establish theoretical analysis for RL<br/>in high-frequency market making"] --> B["Methodology: Continuous-Time &<br/>Game Theory Framework"]
    B --> C["Key Analysis: Effect of Sampling Frequency Δ<br/>Tradeoff: Error ↓ vs Complexity ↑ as Δ→0"]
    B --> D["Multi-Agent Extension<br/>General-Sum Game & Nash Q-Learning"]
    C & D --> E["Computational Process<br/>Monte Carlo Simulation"]
    E --> F["Key Outcomes & Findings"]
    F --> G["Convergence Proof<br/>Nash Eq → Continuous-Time Eq as Δ→0"]
    F --> H["Practical Guidance<br/>Optimal Δ selection for practitioners"]
    F --> I["General Applicability<br/>Extends to optimal execution & other HF problems"]