Learning the Spoofability of Limit Order Books With Interpretable Probabilistic Neural Networks

ArXiv ID: 2504.15908 “View on arXiv”

Authors: Unknown

Abstract

This paper investigates real-time detection of spoofing activity in limit order books, focusing on cryptocurrency centralized exchanges. We first introduce novel order flow variables based on multi-scale Hawkes processes that account both for the size and placement distance from current best prices of new limit orders. Using a Level-3 data set, we train a neural network model to predict the conditional probability distribution of mid price movements based on these features. Our empirical analysis highlights the critical role of the posting distance of limit orders in the price formation process, showing that spoofing detection models that do not take the posting distance into account are inadequate to describe the data. Next, we propose a spoofing detection framework based on the probabilistic market manipulation gain of a spoofing agent and use the previously trained neural network to compute the expected gain. Running this algorithm on all submitted limit orders in the period 2024-12-04 to 2024-12-07, we find that 31% of large orders could spoof the market. Because of its simple neuronal architecture, our model can be run in real time. This work contributes to enhancing market integrity by providing a robust tool for monitoring and mitigating spoofing in both cryptocurrency exchanges and traditional financial markets.

Keywords: Spoofing Detection, Limit Order Book, Hawkes Processes, Neural Networks, Market Manipulation, Cryptocurrency / Market Microstructure

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical tools including multi-scale Hawkes processes and probabilistic neural networks for feature engineering and modeling, while also demonstrating high empirical rigor through backtesting on high-frequency Level-3 cryptocurrency data over multiple days and providing concrete implementation details for real-time detection.
  flowchart TD
    A["Research Goal:<br>Detect spoofing in limit order books<br>in real-time"] --> B["Data & Methodology<br>(Level-3 Crypto Data)"]
    B --> C["Novel Feature Engineering:<br>Multi-scale Hawkes processes<br>including posting distance"]
    C --> D["Probabilistic Neural Network:<br>Predicts mid-price movement<br>distributions"]
    D --> E["Detection Framework:<br>Calculates expected<br>manipulation gain"]
    E --> F["Key Findings & Outcomes"]
    
    subgraph F [" "]
        F1["31% of large orders<br>identified as potential spoofing"]
        F2["Posting distance is critical<br>for accurate detection"]
        F3["Model architecture enables<br>real-time deployment"]
    end