A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books

ArXiv ID: 2507.14960 “View on arXiv”

Authors: Ivan Letteri

Abstract

The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management.

Keywords: Cryptocurrency Market, Limit Order Book (LOB), Outlier Detection, Anomaly Detection, Algorithmic Trading

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Holy Grail
  flowchart TD
    A["Research Goal:<br>Detect Outliers in BTC LOBs"] --> B["Data Input:<br>26,204 LOB Records"]
    B --> C["Methodology:<br>AITA-OBS Framework"]
    C --> D["Model Evaluation:<br>13 Statistical & ML Models"]
    D --> E["Backtesting:<br>Performance Benchmarking"]
    E --> F{"Key Outcomes"}
    F --> G["Top Model: Empirical<br>Covariance (EC)"]
    F --> H["Result: 6.70% Gain vs.<br>Buy-and-Hold"]
    F --> I["Insight: Effective for<br>Risk Management"]