DiffVolume: Diffusion Models for Volume Generation in Limit Order Books
ArXiv ID: 2508.08698 “View on arXiv”
Authors: Zhuohan Wang, Carmine Ventre
Abstract
Modeling limit order books (LOBs) dynamics is a fundamental problem in market microstructure research. In particular, generating high-dimensional volume snapshots with strong temporal and liquidity-dependent patterns remains a challenging task, despite recent work exploring the application of Generative Adversarial Networks to LOBs. In this work, we propose a conditional \textbf{“Diff”}usion model for the generation of future LOB \textbf{“Volume”} snapshots (\textbf{“DiffVolume”}). We evaluate our model across three axes: (1) \textit{“Realism”}, where we show that DiffVolume, conditioned on past volume history and time of day, better reproduces statistical properties such as marginal distribution, spatial correlation, and autocorrelation decay; (2) \textit{“Counterfactual generation”}, allowing for controllable generation under hypothetical liquidity scenarios by additionally conditioning on a target future liquidity profile; and (3) \textit{“Downstream prediction”}, where we show that the synthetic counterfactual data from our model improves the performance of future liquidity forecasting models. Together, these results suggest that DiffVolume provides a powerful and flexible framework for realistic and controllable LOB volume generation.
Keywords: limit order books, diffusion models, generative adversarial networks, liquidity modeling, counterfactual generation, Equities
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced diffusion model theory with SDEs and detailed mathematical derivations, indicating high math complexity; it also presents concrete evaluation on realism, counterfactual generation, and downstream prediction tasks, suggesting strong empirical rigor.
flowchart TD
A["Research Goal: Generate<br>high-fidelity LOB volume snapshots<br>with temporal/liquidity patterns"] --> B["Data: Historical LOB volume data<br>+ Time of Day + Liquidity Profiles"]
B --> C["Methodology: Conditional Diffusion Model<br>DiffVolume"]
C --> D["Generative Process:<br>Samples future volume snapshots<br>conditioned on past history"]
D --> E{"Key Outcomes"}
E --> F["1. Realism: Superior statistical<br>matches to marginal/spatial/autocorrelation"]
E --> G["2. Counterfactuals: Controllable generation<br>under hypothetical liquidity scenarios"]
E --> H["3. Downstream: Synthetic data improves<br>future liquidity forecasting models"]