Forecasting Liquidity Withdraw with Machine Learning Models
ArXiv ID: 2509.22985 “View on arXiv”
Authors: Haochuan, Wang
Abstract
Liquidity withdrawal is a critical indicator of market fragility. In this project, I test a framework for forecasting liquidity withdrawal at the individual-stock level, ranging from less liquid stocks to highly liquid large-cap tickers, and evaluate the relative performance of competing model classes in predicting short-horizon order book stress. We introduce the Liquidity Withdrawal Index (LWI) – defined as the ratio of order cancellations to the sum of standing depth and new additions at the best quotes – as a bounded, interpretable measure of transient liquidity removal. Using Nasdaq market-by-order (MBO) data, we compare a spectrum of approaches: linear benchmarks (AR, HAR), and non-linear tree ensembles (XGBoost), across horizons ranging from 250,ms to 5,s. Beyond predictive accuracy, our results provide insights into order placement and cancellation dynamics, identify regimes where linear versus non-linear signals dominate, and highlight how early-warning indicators of liquidity withdrawal can inform both market surveillance and execution.
Keywords: Liquidity Withdrawal Index (LWI), Order book dynamics, XGBoost, Market microstructure, High-frequency trading, Equities (Individual Stocks)
Complexity vs Empirical Score
- Math Complexity: 6.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced econometric models (AR/HAR, XGBoost) and rigorous feature selection (MI, LASSO) on high-frequency Nasdaq MBO data, with detailed methodology and backtest-ready evaluation using walk-forward cross-validation.
flowchart TD
A["Research Goal<br>Forecast Liquidity Withdrawal"] --> B["Methodology<br>Liquidity Withdrawal Index LWI"]
A --> C["Data Input<br>Nasdaq Market-by-Order MBO"]
B --> D["Model Comparison"]
C --> D
D --> E["Linear Benchmarks<br>AR / HAR"]
D --> F["Non-Linear Ensemble<br>XGBoost"]
E & F --> G["Computational Process<br>Forecast Horizons 250ms to 5s"]
G --> H{"Key Findings & Outcomes"}
H --> I["Predictive Accuracy<br>Model Class Dominance"]
H --> J["Market Insights<br>Order Cancellation Dynamics"]
H --> K["Applications<br>Surveillance & Execution"]