Combined machine learning for stock selection strategy based on dynamic weighting methods
ArXiv ID: 2508.18592 “View on arXiv”
Authors: Lin Cai, Zhiyang He, Caiya Zhang
Abstract
This paper proposes a novel stock selection strategy framework based on combined machine learning algorithms. Two types of weighting methods for three representative machine learning algorithms are developed to predict the returns of the stock selection strategy. One is static weighting based on model evaluation metrics, the other is dynamic weighting based on Information Coefficients (IC). Using CSI 300 index data, we empirically evaluate the strategy’ s backtested performance and model predictive accuracy. The main results are as follows: (1) The strategy by combined machine learning algorithms significantly outperforms single-model approaches in backtested returns. (2) IC-based weighting (particularly IC_Mean) demonstrates greater competitiveness than evaluation-metric-based weighting in both backtested returns and predictive performance. (3) Factor screening substantially enhances the performance of combined machine learning strategies.
Keywords: Machine Learning Ensembles, Stock Selection, Dynamic Weighting, Information Coefficient, Backtesting, Equities
Complexity vs Empirical Score
- Math Complexity: 5.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper uses formal statistical concepts like IC (Information Coefficient) and involves ensemble weighting logic, placing it in the middle-to-high range for math, while it demonstrates strong empirical rigor through backtesting on CSI 300 data, comparing multiple methods, and discussing feature selection.
flowchart TD
A["Research Goal:<br>Develop Novel Stock Selection Strategy<br>using Combined ML & Dynamic Weighting"] --> B["Data: CSI 300 Index Stocks<br>with Technical & Fundamental Factors"]
B --> C["Methodology:<br>Two Weighting Schemes"]
C --> D["Static Weighting<br>Based on Model Evaluation Metrics"]
C --> E["Dynamic Weighting<br>Based on Information Coefficient IC"]
D & E --> F["Ensemble Execution:<br>Combined ML Algorithms<br>(e.g., Random Forest, XGBoost, LightGBM)"]
F --> G["Performance Comparison<br>via Backtesting & Predictive Accuracy"]
G --> H{"Key Outcomes"}
H --> I["1. Combined ML > Single Models<br>in Backtested Returns"]
H --> J["2. IC_Mean Weighting<br>Best Performance (Returns & Accuracy)"]
H --> K["3. Factor Screening<br>Critically Enhances Strategy"]