Forecasting Liquidity Withdraw with Machine Learning Models

ArXiv ID: 2509.22985 “View on arXiv”

Authors: Haochuan, Wang

Abstract

Liquidity withdrawal is a critical indicator of market fragility. In this project, I test a framework for forecasting liquidity withdrawal at the individual-stock level, ranging from less liquid stocks to highly liquid large-cap tickers, and evaluate the relative performance of competing model classes in predicting short-horizon order book stress. We introduce the Liquidity Withdrawal Index (LWI) – defined as the ratio of order cancellations to the sum of standing depth and new additions at the best quotes – as a bounded, interpretable measure of transient liquidity removal. Using Nasdaq market-by-order (MBO) data, we compare a spectrum of approaches: linear benchmarks (AR, HAR), and non-linear tree ensembles (XGBoost), across horizons ranging from 250,ms to 5,s. Beyond predictive accuracy, our results provide insights into order placement and cancellation dynamics, identify regimes where linear versus non-linear signals dominate, and highlight how early-warning indicators of liquidity withdrawal can inform both market surveillance and execution.

Keywords: Liquidity Withdrawal Index (LWI), Order book dynamics, XGBoost, Market microstructure, High-frequency trading, Equities (Individual Stocks)

Complexity vs Empirical Score

  • Math Complexity: 6.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced econometric models (AR/HAR, XGBoost) and rigorous feature selection (MI, LASSO) on high-frequency Nasdaq MBO data, with detailed methodology and backtest-ready evaluation using walk-forward cross-validation.
  flowchart TD
    A["Research Goal<br>Forecast Liquidity Withdrawal"] --> B["Methodology<br>Liquidity Withdrawal Index LWI"]
    A --> C["Data Input<br>Nasdaq Market-by-Order MBO"]
    B --> D["Model Comparison"]
    C --> D
    D --> E["Linear Benchmarks<br>AR / HAR"]
    D --> F["Non-Linear Ensemble<br>XGBoost"]
    E & F --> G["Computational Process<br>Forecast Horizons 250ms to 5s"]
    G --> H{"Key Findings & Outcomes"}
    H --> I["Predictive Accuracy<br>Model Class Dominance"]
    H --> J["Market Insights<br>Order Cancellation Dynamics"]
    H --> K["Applications<br>Surveillance & Execution"]