K-Means Clustering

ClusterLOB: Enhancing Trading Strategies by Clustering Orders in Limit Order Books

ClusterLOB: Enhancing Trading Strategies by Clustering Orders in Limit Order Books ArXiv ID: 2504.20349 “View on arXiv” Authors: Yichi Zhang, Mihai Cucuringu, Alexander Y. Shestopaloff, Stefan Zohren Abstract In the rapidly evolving world of financial markets, understanding the dynamics of limit order book (LOB) is crucial for unraveling market microstructure and participant behavior. We introduce ClusterLOB as a method to cluster individual market events in a stream of market-by-order (MBO) data into different groups. To do so, each market event is augmented with six time-dependent features. By applying the K-means++ clustering algorithm to the resulting order features, we are then able to assign each new order to one of three distinct clusters, which we identify as directional, opportunistic, and market-making participants, each capturing unique trading behaviors. Our experimental results are performed on one year of MBO data containing small-tick, medium-tick, and large-tick stocks from NASDAQ. To validate the usefulness of our clustering, we compute order flow imbalances across each cluster within 30-minute buckets during the trading day. We treat each cluster’s imbalance as a signal that provides insights into trading strategies and participants’ responses to varying market conditions. To assess the effectiveness of these signals, we identify the trading strategy with the highest Sharpe ratio in the training dataset, and demonstrate that its performance in the test dataset is superior to benchmark trading strategies that do not incorporate clustering. We also evaluate trading strategies based on order flow imbalance decompositions across different market event types, including add, cancel, and trade events, to assess their robustness in various market conditions. This work establishes a robust framework for clustering market participant behavior, which helps us to better understand market microstructure, and inform the development of more effective predictive trading signals with practical applications in algorithmic trading and quantitative finance. ...

Dynamic Investment Strategies Through Market Classification and Volatility: A Machine Learning Approach

Dynamic Investment Strategies Through Market Classification and Volatility: A Machine Learning Approach ArXiv ID: 2504.02841 “View on arXiv” Authors: Unknown Abstract This study introduces a dynamic investment framework to enhance portfolio management in volatile markets, offering clear advantages over traditional static strategies. Evaluates four conventional approaches : equal weighted, minimum variance, maximum diversification, and equal risk contribution under dynamic conditions. Using K means clustering, the market is segmented into ten volatility-based states, with transitions forecasted by a Bayesian Markov switching model employing Dirichlet priors and Gibbs sampling. This enables real-time asset allocation adjustments. Tested across two asset sets, the dynamic portfolio consistently achieves significantly higher risk-adjusted returns and substantially higher total returns, outperforming most static methods. By integrating classical optimization with machine learning and Bayesian techniques, this research provides a robust strategy for optimizing investment outcomes in unpredictable market environments. ...

Tactical Asset Allocation with Macroeconomic Regime Detection

Tactical Asset Allocation with Macroeconomic Regime Detection ArXiv ID: 2503.11499 “View on arXiv” Authors: Unknown Abstract This paper extends the tactical asset allocation literature by incorporating regime modeling using techniques from machine learning. We propose a novel model that classifies current regimes, forecasts the distribution of future regimes, and integrates these forecasts with the historical performance of individual assets to optimize portfolio allocations. Utilizing a macroeconomic data set from the FRED-MD database, our approach employs a modified k-means algorithm to ensure consistent regime classification over time. We then leverage these regime predictions to estimate expected returns and volatilities, which are subsequently mapped into portfolio allocations using various sizing schemes. Our method outperforms traditional benchmarks such as equal-weight, buy-and-hold, and random regime models. Additionally, we are the first to apply a regime detection model from a large macroeconomic dataset to tactical asset allocation, demonstrating significant improvements in portfolio performance. Our work presents several key contributions, including a novel data-driven regime detection algorithm tailored for uncertainty in forecasted regimes and applying the FRED-MD data set for tactical asset allocation. ...

Optimizing Portfolio Performance through Clustering and Sharpe Ratio-Based Optimization: A Comparative Backtesting Approach

Optimizing Portfolio Performance through Clustering and Sharpe Ratio-Based Optimization: A Comparative Backtesting Approach ArXiv ID: 2501.12074 “View on arXiv” Authors: Unknown Abstract Optimizing portfolio performance is a fundamental challenge in financial modeling, requiring the integration of advanced clustering techniques and data-driven optimization strategies. This paper introduces a comparative backtesting approach that combines clustering-based portfolio segmentation and Sharpe ratio-based optimization to enhance investment decision-making. First, we segment a diverse set of financial assets into clusters based on their historical log-returns using K-Means clustering. This segmentation enables the grouping of assets with similar return characteristics, facilitating targeted portfolio construction. Next, for each cluster, we apply a Sharpe ratio-based optimization model to derive optimal weights that maximize risk-adjusted returns. Unlike traditional mean-variance optimization, this approach directly incorporates the trade-off between returns and volatility, resulting in a more balanced allocation of resources within each cluster. The proposed framework is evaluated through a backtesting study using historical data spanning multiple asset classes. Optimized portfolios for each cluster are constructed and their cumulative returns are compared over time against a traditional equal-weighted benchmark portfolio. ...

Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks

Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks ArXiv ID: 2412.16160 “View on arXiv” Authors: Unknown Abstract This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting that involves a dual competitive feature importance mechanism and clustering via shallow neural network topology for fast training. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering and the reliance on potentially uninformative features. More specifically, our approach involves a dual competitive mechanism for feature importance, combining the mean-decrease impurity (MDI) method and a gradient descent (GD) based feature importance mechanism. This approach, tested on HFT Level 1 order book data for 20 S&P 500 stocks, enhances the forecasting ability of the RBFNN regressor. Our findings suggest that an autonomous approach to feature selection and clustering is crucial, as each stock requires a different input feature space. Overall, by automating the feature selection and clustering processes, we remove the need for manual topological grid search and provide a more efficient way to predict LOB’s mid-price. ...

A K-means Algorithm for Financial Market Risk Forecasting

A K-means Algorithm for Financial Market Risk Forecasting ArXiv ID: 2405.13076 “View on arXiv” Authors: Unknown Abstract Financial market risk forecasting involves applying mathematical models, historical data analysis and statistical methods to estimate the impact of future market movements on investments. This process is crucial for investors to develop strategies, financial institutions to manage assets and regulators to formulate policy. In today’s society, there are problems of high error rate and low precision in financial market risk prediction, which greatly affect the accuracy of financial market risk prediction. K-means algorithm in machine learning is an effective risk prediction technique for financial market. This study uses K-means algorithm to develop a financial market risk prediction system, which significantly improves the accuracy and efficiency of financial market risk prediction. Ultimately, the outcomes of the experiments confirm that the K-means algorithm operates with user-friendly simplicity and achieves a 94.61% accuracy rate ...