High-Frequency Trading Liquidity Analysis | Application of Machine Learning Classification

ArXiv ID: 2408.10016 “View on arXiv”

Authors: Unknown

Abstract

This research presents a comprehensive framework for analyzing liquidity in financial markets, particularly in the context of high-frequency trading. By leveraging advanced machine learning classification techniques, including Logistic Regression, Support Vector Machine, and Random Forest, the study aims to predict minute-level price movements using an extensive set of liquidity metrics derived from the Trade and Quote (TAQ) data. The findings reveal that employing a broad spectrum of liquidity measures yields higher predictive accuracy compared to models utilizing a reduced subset of features. Key liquidity metrics, such as Liquidity Ratio, Flow Ratio, and Turnover, consistently emerged as significant predictors across all models, with the Random Forest algorithm demonstrating superior accuracy. This study not only underscores the critical role of liquidity in market stability and transaction costs but also highlights the complexities involved in short-interval market predictions. The research suggests that a comprehensive set of liquidity measures is essential for accurate prediction, and proposes future work to validate these findings across different stock datasets to assess their generalizability.

Keywords: High-Frequency Trading, Liquidity Metrics, Random Forest, Trade and Quote (TAQ) Data, Price Movement Prediction, Equities

Complexity vs Empirical Score

  • Math Complexity: 3.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Street Traders
  • Why: The paper relies on standard machine learning classification and statistical feature analysis with minimal advanced mathematical derivations, but it is heavily empirical, utilizing extensive TAQ data, minute-level sampling, and cross-validation to backtest predictive models for price movement.
  flowchart TD
    A["Research Goal<br>Predict minute-level price movements<br>using liquidity metrics"] --> B["Data Input<br>Trade and Quote TAQ Data"]
    B --> C["Methodology Feature Engineering<br>Derive liquidity metrics<br>Liquidity Ratio, Flow Ratio, Turnover"]
    C --> D["Computational Process<br>ML Classification Models<br>Logistic Regression, SVM, Random Forest"]
    D --> E{"Model Training &<br>Evaluation"}
    E --> F["Key Finding 1<br>Full feature set yields<br>higher predictive accuracy"]
    E --> G["Key Finding 2<br>Random Forest demonstrates<br>superior accuracy"]
    E --> H["Key Finding 3<br>Liquidity metrics are<br>essential for prediction"]