TRADES: Generating Realistic Market Simulations with Diffusion Models

ArXiv ID: 2502.07071 “View on arXiv”

Authors: Unknown

Abstract

Financial markets are complex systems characterized by high statistical noise, nonlinearity, volatility, and constant evolution. Thus, modeling them is extremely hard. Here, we address the task of generating realistic and responsive Limit Order Book (LOB) market simulations, which are fundamental for calibrating and testing trading strategies, performing market impact experiments, and generating synthetic market data. We propose a novel TRAnsformer-based Denoising Diffusion Probabilistic Engine for LOB Simulations (TRADES). TRADES generates realistic order flows as time series conditioned on the state of the market, leveraging a transformer-based architecture that captures the temporal and spatial characteristics of high-frequency market data. There is a notable absence of quantitative metrics for evaluating generative market simulation models in the literature. To tackle this problem, we adapt the predictive score, a metric measured as an MAE, to market data by training a stock price predictive model on synthetic data and testing it on real data. We compare TRADES with previous works on two stocks, reporting a 3.27 and 3.48 improvement over SoTA according to the predictive score, demonstrating that we generate useful synthetic market data for financial downstream tasks. Furthermore, we assess TRADES’s market simulation realism and responsiveness, showing that it effectively learns the conditional data distribution and successfully reacts to an experimental agent, giving sprout to possible calibrations and evaluations of trading strategies and market impact experiments. To perform the experiments, we developed DeepMarket, the first open-source Python framework for LOB market simulation with deep learning. In our repository, we include a synthetic LOB dataset composed of TRADES’s generated simulations.

Keywords: Limit Order Book, Denoising Diffusion Probabilistic Models, Transformer, Market Simulation, Market Impact, Stocks

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced deep learning architectures (diffusion models and transformers) and formal statistical metrics (predictive score based on MAE), requiring sophisticated mathematical foundations. It demonstrates high empirical rigor by releasing an open-source framework (DeepMarket), a synthetic dataset, and quantitative backtesting-style experiments comparing against state-of-the-art methods on real market data.
  flowchart TD
    A["Research Goal:<br>Generate Realistic &<br>Responsive LOB Simulations"] --> B["Data Input:<br>High-Frequency<br>Real Market Data (LOB)"]
    B --> C["Methodology:<br>TRADES Model<br>Transformer-based Denoising Diffusion Probabilistic Engine"]
    C --> D["Computational Process:<br>Conditional Generation<br>State of Market → Order Flows"]
    D --> E["Evaluation:<br>Adapted Predictive Score<br>(MAE: Synthetic Train → Real Test)"]
    E --> F["Key Findings:<br>3.27 & 3.48 SoTA Improvement<br>High Realism & Responsiveness<br>Released: DeepMarket Framework"]