Interpretable ML for High-Frequency Execution

ArXiv ID: 2307.04863 “View on arXiv”

Authors: Unknown

Abstract

Order placement tactics play a crucial role in high-frequency trading algorithms and their design is based on understanding the dynamics of the order book. Using high quality high-frequency data and a set of microstructural features, we exhibit strong state dependence properties of the fill probability function. We train a neural network to infer the fill probability function for a fixed horizon. Since we aim at providing a high-frequency execution framework, we use a simple architecture. A weighting method is applied to the loss function such that the model learns from censored data. By comparing numerical results obtained on both digital asset centralized exchanges (CEXs) and stock markets, we are able to analyze dissimilarities between feature importances of the fill probability of small tick crypto pairs and Euronext equities. The practical use of this model is illustrated with a fixed time horizon execution problem in which both the decision to post a limit order or to immediately execute and the optimal distance of placement are characterized. We discuss the importance of accurately estimating the clean-up cost that occurs in the case of a non-execution and we show it can be well approximated by a smooth function of market features. We finally assess the performance of our model with a backtesting approach that avoids the insertion of hypothetical orders and makes possible to test the order placement algorithm with orders that realistically impact the price formation process.

Keywords: High-frequency trading, Order book dynamics, Neural network, Fill probability, Limit order placement, Digital assets (Crypto) / Equities

Complexity vs Empirical Score

  • Math Complexity: 6.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced statistical methods (survival analysis, neural networks with censored data weighting) and introduces new microstructure features, indicating moderate-to-high mathematical complexity. It demonstrates strong empirical rigor through backtesting on real market data (CEX and equities), analysis of feature importances, and a realistic order placement router that accounts for market impact.
  flowchart TD
    A["Research Goal:<br/>Model Fill Probability for Limit Order Placement"] --> B{"Data Sources"};
    B --> C["High-Frequency Data:<br/>Crypto CEX & Euronext Equities"];
    C --> D["Computational Process:<br/>NN Training with Censored Loss"];
    D --> E["Outcome 1:<br/>Learned Fill Probability Model"];
    E --> F["Outcome 2:<br/>Backtested Execution Strategy"];
    D --> F;
    F --> G["Outcome 3:<br/>Feature Importance Differences<br/>Crypto vs. Equities"];