Ensemble Learning

Hybrid Quantum-Classical Ensemble Learning for S&P 500 Directional Prediction

Hybrid Quantum-Classical Ensemble Learning for S&P 500 Directional Prediction ArXiv ID: 2512.15738 “View on arXiv” Authors: Abraham Itzhak Weinberg Abstract Financial market prediction is a challenging application of machine learning, where even small improvements in directional accuracy can yield substantial value. Most models struggle to exceed 55–57% accuracy due to high noise, non-stationarity, and market efficiency. We introduce a hybrid ensemble framework combining quantum sentiment analysis, Decision Transformer architecture, and strategic model selection, achieving 60.14% directional accuracy on S&P 500 prediction, a 3.10% improvement over individual models. Our framework addresses three limitations of prior approaches. First, architecture diversity dominates dataset diversity: combining different learning algorithms (LSTM, Decision Transformer, XGBoost, Random Forest, Logistic Regression) on the same data outperforms training identical architectures on multiple datasets (60.14% vs.\ 52.80%), confirmed by correlation analysis ($r>0.6$ among same-architecture models). Second, a 4-qubit variational quantum circuit enhances sentiment analysis, providing +0.8% to +1.5% gains per model. Third, smart filtering excludes weak predictors (accuracy $<52%$), improving ensemble performance (Top-7 models: 60.14% vs.\ all 35 models: 51.2%). We evaluate on 2020–2023 market data across seven instruments, covering diverse regimes including the COVID-19 crash and inflation-driven correction. McNemar’s test confirms statistical significance ($p<0.05$). Preliminary backtesting with confidence-based filtering (6+ model consensus) yields a Sharpe ratio of 1.2 versus buy-and-hold’s 0.8, demonstrating practical trading potential. ...

Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection

Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection ArXiv ID: 2511.17462 “View on arXiv” Authors: Ryan Engel, Yu Chen, Pawel Polak, Ioana Boier Abstract Conditional Autoencoders (CAEs) offer a flexible, interpretable approach for estimating latent asset-pricing factors from firm characteristics. However, existing studies usually limit the latent factor dimension to around K=5 due to concerns that larger K can degrade performance. To overcome this challenge, we propose a scalable framework that couples a high-dimensional CAE with an uncertainty-aware factor selection procedure. We employ three models for quantile prediction: zero-shot Chronos, a pretrained time-series foundation model (ZS-Chronos), gradient-boosted quantile regression trees using XGBoost and RAPIDS (Q-Boost), and an I.I.D bootstrap-based sample mean model (IID-BS). For each model, we rank factors by forecast uncertainty and retain the top-k most predictable factors for portfolio construction, where k denotes the selected subset of factors. This pruning strategy delivers substantial gains in risk-adjusted performance across all forecasting models. Furthermore, due to each model’s uncorrelated predictions, a performance-weighted ensemble consistently outperforms individual models with higher Sharpe, Sortino, and Omega ratios. ...

A three-step machine learning approach to predict market bubbles with financial news

A three-step machine learning approach to predict market bubbles with financial news ArXiv ID: 2510.16636 “View on arXiv” Authors: Abraham Atsiwo Abstract This study presents a three-step machine learning framework to predict bubbles in the S&P 500 stock market by combining financial news sentiment with macroeconomic indicators. Building on traditional econometric approaches, the proposed approach predicts bubble formation by integrating textual and quantitative data sources. In the first step, bubble periods in the S&P 500 index are identified using a right-tailed unit root test, a widely recognized real-time bubble detection method. The second step extracts sentiment features from large-scale financial news articles using natural language processing (NLP) techniques, which capture investors’ expectations and behavioral patterns. In the final step, ensemble learning methods are applied to predict bubble occurrences based on high sentiment-based and macroeconomic predictors. Model performance is evaluated through k-fold cross-validation and compared against benchmark machine learning algorithms. Empirical results indicate that the proposed three-step ensemble approach significantly improves predictive accuracy and robustness, providing valuable early warning insights for investors, regulators, and policymakers in mitigating systemic financial risks. ...

RegimeFolio: A Regime Aware ML System for Sectoral Portfolio Optimization in Dynamic Markets

RegimeFolio: A Regime Aware ML System for Sectoral Portfolio Optimization in Dynamic Markets ArXiv ID: 2510.14986 “View on arXiv” Authors: Yiyao Zhang, Diksha Goel, Hussain Ahmad, Claudia Szabo Abstract Financial markets are inherently non-stationary, with shifting volatility regimes that alter asset co-movements and return distributions. Standard portfolio optimization methods, typically built on stationarity or regime-agnostic assumptions, struggle to adapt to such changes. To address these challenges, we propose RegimeFolio, a novel regime-aware and sector-specialized framework that, unlike existing regime-agnostic models such as DeepVol and DRL optimizers, integrates explicit volatility regime segmentation with sector-specific ensemble forecasting and adaptive mean-variance allocation. This modular architecture ensures forecasts and portfolio decisions remain aligned with current market conditions, enhancing robustness and interpretability in dynamic markets. RegimeFolio combines three components: (i) an interpretable VIX-based classifier for market regime detection; (ii) regime and sector-specific ensemble learners (Random Forest, Gradient Boosting) to capture conditional return structures; and (iii) a dynamic mean-variance optimizer with shrinkage-regularized covariance estimates for regime-aware allocation. We evaluate RegimeFolio on 34 large cap U.S. equities from 2020 to 2024. The framework achieves a cumulative return of 137 percent, a Sharpe ratio of 1.17, a 12 percent lower maximum drawdown, and a 15 to 20 percent improvement in forecast accuracy compared to conventional and advanced machine learning benchmarks. These results show that explicitly modeling volatility regimes in predictive learning and portfolio allocation enhances robustness and leads to more dependable decision-making in real markets. ...

Financial fraud detection system based on improved random forest and gradient boosting machine (GBM)

Financial fraud detection system based on improved random forest and gradient boosting machine (GBM) ArXiv ID: 2502.15822 “View on arXiv” Authors: Unknown Abstract This paper proposes a financial fraud detection system based on improved Random Forest (RF) and Gradient Boosting Machine (GBM). Specifically, the system introduces a novel model architecture called GBM-SSRF (Gradient Boosting Machine with Simplified and Strengthened Random Forest), which cleverly combines the powerful optimization capabilities of the gradient boosting machine (GBM) with improved randomization. The computational efficiency and feature extraction capabilities of the Simplified and Strengthened Random Forest (SSRF) forest significantly improve the performance of financial fraud detection. Although the traditional random forest model has good classification capabilities, it has high computational complexity when faced with large-scale data and has certain limitations in feature selection. As a commonly used ensemble learning method, the GBM model has significant advantages in optimizing performance and handling nonlinear problems. However, GBM takes a long time to train and is prone to overfitting problems when data samples are unbalanced. In response to these limitations, this paper optimizes the random forest based on the structure, reducing the computational complexity and improving the feature selection ability through the structural simplification and enhancement of the random forest. In addition, the optimized random forest is embedded into the GBM framework, and the model can maintain efficiency and stability with the help of GBM’s gradient optimization capability. Experiments show that the GBM-SSRF model not only has good performance, but also has good robustness and generalization capabilities, providing an efficient and reliable solution for financial fraud detection. ...

Composing Ensembles of Instrument-Model Pairs for Optimizing Profitability in Algorithmic Trading

Composing Ensembles of Instrument-Model Pairs for Optimizing Profitability in Algorithmic Trading ArXiv ID: 2411.13559 “View on arXiv” Authors: Unknown Abstract Financial markets are nonlinear with complexity, where different types of assets are traded between buyers and sellers, each having a view to maximize their Return on Investment (ROI). Forecasting market trends is a challenging task since various factors like stock-specific news, company profiles, public sentiments, and global economic conditions influence them. This paper describes a daily price directional predictive system of financial instruments, addressing the difficulty of predicting short-term price movements. This paper will introduce the development of a novel trading system methodology by proposing a two-layer Composing Ensembles architecture, optimized through grid search, to predict whether the price will rise or fall the next day. This strategy was back-tested on a wide range of financial instruments and time frames, demonstrating an improvement of 20% over the benchmark, representing a standard investment strategy. ...

Blending Ensemble for Classification with Genetic-algorithm generated Alpha factors and Sentiments (GAS)

Blending Ensemble for Classification with Genetic-algorithm generated Alpha factors and Sentiments (GAS) ArXiv ID: 2411.03035 “View on arXiv” Authors: Unknown Abstract With the increasing maturity and expansion of the cryptocurrency market, understanding and predicting its price fluctuations has become an important issue in the field of financial engineering. This article introduces an innovative Genetic Algorithm-generated Alpha Sentiment (GAS) blending ensemble model specifically designed to predict Bitcoin market trends. The model integrates advanced ensemble learning methods, feature selection algorithms, and in-depth sentiment analysis to effectively capture the complexity and variability of daily Bitcoin trading data. The GAS framework combines 34 Alpha factors with 8 news economic sentiment factors to provide deep insights into Bitcoin price fluctuations by accurately analyzing market sentiment and technical indicators. The core of this study is using a stacked model (including LightGBM, XGBoost, and Random Forest Classifier) for trend prediction which demonstrates excellent performance in traditional buy-and-hold strategies. In addition, this article also explores the effectiveness of using genetic algorithms to automate alpha factor construction as well as enhancing predictive models through sentiment analysis. Experimental results show that the GAS model performs competitively in daily Bitcoin trend prediction especially when analyzing highly volatile financial assets with rich data. ...

Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research

Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research ArXiv ID: 2403.14483 “View on arXiv” Authors: Unknown Abstract Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, communication operators are also builders and maintainers of communication networks. Internet data improves the user’s credit evaluation strategy. This paper uses the massive data provided by communication operators to carry out research on the operator’s user credit evaluation model based on the fusion LightGBM algorithm. First, for the massive data related to user evaluation provided by operators, key features are extracted by data preprocessing and feature engineering methods, and a multi-dimensional feature set with statistical significance is constructed; then, linear regression, decision tree, LightGBM, and other machine learning algorithms build multiple basic models to find the best basic model; finally, integrates Averaging, Voting, Blending, Stacking and other integrated algorithms to refine multiple fusion models, and finally establish the most suitable fusion model for operator user evaluation. ...

A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting

A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting ArXiv ID: 2401.11621 “View on arXiv” Authors: Unknown Abstract Forecasting speculative stock prices is essential for effective investment risk management that drives the need for the development of innovative algorithms. However, the speculative nature, volatility, and complex sequential dependencies within financial markets present inherent challenges which necessitate advanced techniques. This paper proposes a novel framework, CAB-XDE (customized attention BiLSTM-XGB decision ensemble), for predicting the daily closing price of speculative stock Bitcoin-USD (BTC-USD). CAB-XDE framework integrates a customized bi-directional long short-term memory (BiLSTM) with the attention mechanism and the XGBoost algorithm. The customized BiLSTM leverages its learning capabilities to capture the complex sequential dependencies and speculative market trends. Additionally, the new attention mechanism dynamically assigns weights to influential features, thereby enhancing interpretability, and optimizing effective cost measures and volatility forecasting. Moreover, XGBoost handles nonlinear relationships and contributes to the proposed CAB-XDE framework robustness. Additionally, the weight determination theory-error reciprocal method further refines predictions. This refinement is achieved by iteratively adjusting model weights. It is based on discrepancies between theoretical expectations and actual errors in individual customized attention BiLSTM and XGBoost models to enhance performance. Finally, the predictions from both XGBoost and customized attention BiLSTM models are concatenated to achieve diverse prediction space and are provided to the ensemble classifier to enhance the generalization capabilities of CAB-XDE. The proposed CAB-XDE framework is empirically validated on volatile Bitcoin market, sourced from Yahoo Finance and outperforms state-of-the-art models with a MAPE of 0.0037, MAE of 84.40, and RMSE of 106.14. ...

Combining predictive distributions of electricity prices: Does minimizing the CRPS lead to optimal decisions in day-ahead bidding?

Combining predictive distributions of electricity prices: Does minimizing the CRPS lead to optimal decisions in day-ahead bidding? ArXiv ID: 2308.15443 “View on arXiv” Authors: Unknown Abstract Probabilistic price forecasting has recently gained attention in power trading because decisions based on such predictions can yield significantly higher profits than those made with point forecasts alone. At the same time, methods are being developed to combine predictive distributions, since no model is perfect and averaging generally improves forecasting performance. In this article we address the question of whether using CRPS learning, a novel weighting technique minimizing the continuous ranked probability score (CRPS), leads to optimal decisions in day-ahead bidding. To this end, we conduct an empirical study using hourly day-ahead electricity prices from the German EPEX market. We find that increasing the diversity of an ensemble can have a positive impact on accuracy. At the same time, the higher computational cost of using CRPS learning compared to an equal-weighted aggregation of distributions is not offset by higher profits, despite significantly more accurate predictions. ...