ClusterLOB: Enhancing Trading Strategies by Clustering Orders in Limit Order Books

ArXiv ID: 2504.20349 “View on arXiv”

Authors: Yichi Zhang, Mihai Cucuringu, Alexander Y. Shestopaloff, Stefan Zohren

Abstract

In the rapidly evolving world of financial markets, understanding the dynamics of limit order book (LOB) is crucial for unraveling market microstructure and participant behavior. We introduce ClusterLOB as a method to cluster individual market events in a stream of market-by-order (MBO) data into different groups. To do so, each market event is augmented with six time-dependent features. By applying the K-means++ clustering algorithm to the resulting order features, we are then able to assign each new order to one of three distinct clusters, which we identify as directional, opportunistic, and market-making participants, each capturing unique trading behaviors. Our experimental results are performed on one year of MBO data containing small-tick, medium-tick, and large-tick stocks from NASDAQ. To validate the usefulness of our clustering, we compute order flow imbalances across each cluster within 30-minute buckets during the trading day. We treat each cluster’s imbalance as a signal that provides insights into trading strategies and participants’ responses to varying market conditions. To assess the effectiveness of these signals, we identify the trading strategy with the highest Sharpe ratio in the training dataset, and demonstrate that its performance in the test dataset is superior to benchmark trading strategies that do not incorporate clustering. We also evaluate trading strategies based on order flow imbalance decompositions across different market event types, including add, cancel, and trade events, to assess their robustness in various market conditions. This work establishes a robust framework for clustering market participant behavior, which helps us to better understand market microstructure, and inform the development of more effective predictive trading signals with practical applications in algorithmic trading and quantitative finance.

Keywords: Limit Order Book (LOB), Market Microstructure, K-means Clustering, Order Flow Imbalance, Algorithmic Trading, Equities

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Street Traders
  • Why: The paper employs standard statistical methods (K-means clustering) and simple signals (order flow imbalance) without advanced mathematical derivations, resulting in moderate math complexity; however, it demonstrates high empirical rigor through the use of one year of MBO data from NASDAQ, explicit backtesting protocols, Sharpe ratio optimization, and the provision of source code for reproducibility.
  flowchart TD
    A["Research Goal"] --> B["Data Preparation"]
    B --> C["Feature Engineering"]
    C --> D["Clustering K-means++"]
    D --> E["Signal Generation"]
    E --> F["Strategy Evaluation"]
    F --> G["Key Findings"]

    subgraph A ["Research Goal"]
        A1["How can clustering orders in<br>Limit Order Books enhance<br>trading strategies?"]
    end

    subgraph B ["Data Inputs"]
        B1["1 Year NASDAQ MBO Data"]
        B2["Small/Medium/Large-Tick Stocks"]
    end

    subgraph C ["Methodology"]
        C1["Augment Orders with<br>6 Time-Dependent Features"]
    end

    subgraph D ["Computational Process"]
        D1["K-means++ Clustering<br>3 Clusters Identified:"]
        D2["Directional<br>Opportunistic<br>Market-Making"]
    end

    subgraph E ["Signal Generation"]
        E1["Order Flow Imbalance<br>per Cluster (30-min buckets)"]
    end

    subgraph F ["Validation"]
        F1["Training: Optimize Sharpe Ratio"]
        F2["Testing: Benchmark Comparison<br>(vs Non-Clustered Strategies)"]
    end

    subgraph G ["Key Outcomes"]
        G1["Superior Performance with Clustering"]
        G2["Robust Framework for<br>Market Microstructure Analysis"]
        G3["Practical Application in<br>Algorithmic Trading"]
    end