Painting the market: generative diffusion models for financial limit order book simulation and forecasting

ArXiv ID: 2509.05107 “View on arXiv”

Authors: Alfred Backhouse, Kang Li, Jakob Foerster, Anisoara Calinescu, Stefan Zohren

Abstract

Simulating limit order books (LOBs) has important applications across forecasting and backtesting for financial market data. However, deep generative models struggle in this context due to the high noise and complexity of the data. Previous work uses autoregressive models, although these experience error accumulation over longer-time sequences. We introduce a novel approach, converting LOB data into a structured image format, and applying diffusion models with inpainting to generate future LOB states. This method leverages spatio-temporal inductive biases in the order book and enables parallel generation of long sequences overcoming issues with error accumulation. We also publicly contribute to LOB-Bench, the industry benchmark for LOB generative models, to allow fair comparison between models using Level-2 and Level-3 order book data (with or without message level data respectively). We show that our model achieves state-of-the-art performance on LOB-Bench, despite using lower fidelity data as input. We also show that our method prioritises coherent global structures over local, high-fidelity details, providing significant improvements over existing methods on certain metrics. Overall, our method lays a strong foundation for future research into generative diffusion approaches to LOB modelling.

Keywords: Limit Order Books (LOB), Diffusion Models, Spatio-temporal Inductive Biases, Generative Modeling, Inpainting, Equities / High-Frequency Trading

Complexity vs Empirical Score

  • Math Complexity: 8.0/10
  • Empirical Rigor: 9.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical concepts from diffusion models and stochastic processes, and demonstrates high empirical rigor through rigorous benchmarking on LOB-Bench with Level-2 and Level-3 data, showcasing state-of-the-art performance.
  flowchart TD
    A["Research Goal: Generate & Forecast LOB Data<br>using Diffusion Models"] --> B["Data Preparation: LOB-Bench<br>Level-2 & Level-3 Data"]
    B --> C["Methodology: Transform LOB Data<br>into Structured Image Format"]
    C --> D["Spatio-temporal Diffusion Model<br>with Inpainting"]
    D --> E["Parallel Generation of<br>Long Sequences"]
    E --> F{"Evaluation on<br>LOB-Bench Metrics"}
    F --> G["Outcomes: SOTA Performance<br>Coherent Global Structures"]
    G --> H["Future Foundation for<br>Generative LOB Modelling"]