Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks
ArXiv ID: 2412.16160 “View on arXiv”
Authors: Unknown
Abstract
This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting that involves a dual competitive feature importance mechanism and clustering via shallow neural network topology for fast training. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering and the reliance on potentially uninformative features. More specifically, our approach involves a dual competitive mechanism for feature importance, combining the mean-decrease impurity (MDI) method and a gradient descent (GD) based feature importance mechanism. This approach, tested on HFT Level 1 order book data for 20 S&P 500 stocks, enhances the forecasting ability of the RBFNN regressor. Our findings suggest that an autonomous approach to feature selection and clustering is crucial, as each stock requires a different input feature space. Overall, by automating the feature selection and clustering processes, we remove the need for manual topological grid search and provide a more efficient way to predict LOB’s mid-price.
Keywords: High-frequency trading, Radial basis function neural network, Feature selection, Order book forecasting, K-means clustering, Equities
Complexity vs Empirical Score
- Math Complexity: 6.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced ML concepts (neural network topology, gradient descent, statistical measures) but lacks deep theoretical derivations, while its empirical rigor is supported by testing on real HFT data from 20 S&P 500 stocks with clear performance metrics.
flowchart TD
A["Research Goal:<br>Automate HFT Stock<br>Forecasting"] --> B["Input: HFT Level 1<br>Order Book Data<br>(20 S&P 500 Stocks)"]
B --> C["Methodology:<br>Dual Competitive<br>Feature Importance"]
C --> C1["(1) Mean Decrease<br>Impurity MDI"]
C --> C2["(2) Gradient Descent<br>Based Importance"]
C1 & C2 --> D["K-Means Clustering<br>Integrated into RBFNN"]
D --> E["Computational Process:<br>Autonomous Feature<br>Selection & Topology"]
E --> F["Key Outcome:<br>Improved Mid-Price<br>Forecasting Accuracy"]