false

Forecasting the U.S. Treasury Yield Curve: A Distributionally Robust Machine Learning Approach

Forecasting the U.S. Treasury Yield Curve: A Distributionally Robust Machine Learning Approach ArXiv ID: 2601.04608 “View on arXiv” Authors: Jinjun Liu, Ming-Yen Cheng Abstract We study U.S. Treasury yield curve forecasting under distributional uncertainty and recast forecasting as an operations research and managerial decision problem. Rather than minimizing average forecast error, the forecaster selects a decision rule that minimizes worst case expected loss over an ambiguity set of forecast error distributions. To this end, we propose a distributionally robust ensemble forecasting framework that integrates parametric factor models with high dimensional nonparametric machine learning models through adaptive forecast combinations. The framework consists of three machine learning components. First, a rolling window Factor Augmented Dynamic Nelson Siegel model captures level, slope, and curvature dynamics using principal components extracted from economic indicators. Second, Random Forest models capture nonlinear interactions among macro financial drivers and lagged Treasury yields. Third, distributionally robust forecast combination schemes aggregate heterogeneous forecasts under moment uncertainty, penalizing downside tail risk via expected shortfall and stabilizing second moment estimation through ridge regularized covariance matrices. The severity of the worst case criterion is adjustable, allowing the forecaster to regulate the trade off between robustness and statistical efficiency. Using monthly data, we evaluate out of sample forecasts across maturities and horizons from one to twelve months ahead. Adaptive combinations deliver superior performance at short horizons, while Random Forest forecasts dominate at longer horizons. Extensions to global sovereign bond yields confirm the stability and generalizability of the proposed framework. ...

January 8, 2026 · 2 min · Research Team

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions ArXiv ID: 2512.02036 “View on arXiv” Authors: Juan C. King, Jose M. Amigo Abstract The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and applying advanced techniques of Machine Learning and Deep Learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates Long Short-Term Memory (LSTM) networks with algorithms based on decision trees, such as Random Forest and Gradient Boosting. While the former analyze price patterns of financial assets, the latter are fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, Random Forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables. ...

November 20, 2025 · 2 min · Research Team

A Practical Machine Learning Approach for Dynamic Stock Recommendation

A Practical Machine Learning Approach for Dynamic Stock Recommendation ArXiv ID: 2511.12129 “View on arXiv” Authors: Hongyang Yang, Xiao-Yang Liu, Qingwei Wu Abstract Stock recommendation is vital to investment companies and investors. However, no single stock selection strategy will always win while analysts may not have enough time to check all S&P 500 stocks (the Standard & Poor’s 500). In this paper, we propose a practical scheme that recommends stocks from S&P 500 using machine learning. Our basic idea is to buy and hold the top 20% stocks dynamically. First, we select representative stock indicators with good explanatory power. Secondly, we take five frequently used machine learning methods, including linear regression, ridge regression, stepwise regression, random forest and generalized boosted regression, to model stock indicators and quarterly log-return in a rolling window. Thirdly, we choose the model with the lowest Mean Square Error in each period to rank stocks. Finally, we test the selected stocks by conducting portfolio allocation methods such as equally weighted, mean-variance, and minimum-variance. Our empirical results show that the proposed scheme outperforms the long-only strategy on the S&P 500 index in terms of Sharpe ratio and cumulative returns. This work is fully open-sourced at \href{“https://github.com/AI4Finance-Foundation/Dynamic-Stock-Recommendation-Machine_Learning-Published-Paper-IEEE"}{"GitHub"}. ...

November 15, 2025 · 2 min · Research Team

Binary Tree Option Pricing Under Market Microstructure Effects: A Random Forest Approach

Binary Tree Option Pricing Under Market Microstructure Effects: A Random Forest Approach ArXiv ID: 2507.16701 “View on arXiv” Authors: Akash Deep, Chris Monico, W. Brent Lindquist, Svetlozar T. Rachev, Frank J. Fabozzi Abstract We propose a machine learning-based extension of the classical binomial option pricing model that incorporates key market microstructure effects. Traditional models assume frictionless markets, overlooking empirical features such as bid-ask spreads, discrete price movements, and serial return correlations. Our framework augments the binomial tree with path-dependent transition probabilities estimated via Random Forest classifiers trained on high-frequency market data. This approach preserves no-arbitrage conditions while embedding real-world trading dynamics into the pricing model. Using 46,655 minute-level observations of SPY from January to June 2025, we achieve an AUC of 88.25% in forecasting one-step price movements. Order flow imbalance is identified as the most influential predictor, contributing 43.2% to feature importance. After resolving time-scaling inconsistencies in tree construction, our model yields option prices that deviate by 13.79% from Black-Scholes benchmarks, highlighting the impact of microstructure on fair value estimation. While computational limitations restrict the model to short-term derivatives, our results offer a robust, data-driven alternative to classical pricing methods grounded in empirical market behavior. ...

July 22, 2025 · 2 min · Research Team

A Regression-Based Share Market Prediction Model for Bangladesh

A Regression-Based Share Market Prediction Model for Bangladesh ArXiv ID: 2507.18643 “View on arXiv” Authors: Syeda Tasnim Fabiha, Rubaiyat Jahan Mumu, Farzana Aktar, B M Mainul Hossain Abstract Share market is one of the most important sectors of economic development of a country. Everyday almost all companies issue their shares and investors buy and sell shares of these companies. Generally investors want to buy shares of the companies whose market liquidity is comparatively greater. Market liquidity depends on the average price of a share. In this paper, a thorough linear regression analysis has been performed on the stock market data of Dhaka Stock Exchange. Later, the linear model has been compared with random forest based on different metrics showing better results for random forest model. However, the amount of individual significance of different factors on the variability of stock price has been identified and explained. This paper also shows that the time series data is not capable of generating a predictive linear model for analysis. ...

July 10, 2025 · 2 min · Research Team

Empirical Models of the Time Evolution of SPX Option Prices

Empirical Models of the Time Evolution of SPX Option Prices ArXiv ID: 2506.17511 “View on arXiv” Authors: Alessio Brini, David A. Hsieh, Patrick Kuiper, Sean Moushegian, David Ye Abstract The key objective of this paper is to develop an empirical model for pricing SPX options that can be simulated over future paths of the SPX. To accomplish this, we formulate and rigorously evaluate several statistical models, including neural network, random forest, and linear regression. These models use the observed characteristics of the options as inputs – their price, moneyness and time-to-maturity, as well as a small set of external inputs, such as the SPX and its past history, dividend yield, and the risk-free rate. Model evaluation is performed on historical options data, spanning 30 years of daily observations. Significant effort is given to understanding the data and ensuring explainability for the neural network. A neural network model with two hidden layers and four neurons per layer, trained with minimal hyperparameter tuning, performs well against the theoretical Black-Scholes-Merton model for European options, as well as two other empirical models based on the random forest and the linear regression. It delivers arbitrage-free option prices without requiring these conditions to be imposed. ...

June 20, 2025 · 2 min · Research Team

Financial fraud detection system based on improved random forest and gradient boosting machine (GBM)

Financial fraud detection system based on improved random forest and gradient boosting machine (GBM) ArXiv ID: 2502.15822 “View on arXiv” Authors: Unknown Abstract This paper proposes a financial fraud detection system based on improved Random Forest (RF) and Gradient Boosting Machine (GBM). Specifically, the system introduces a novel model architecture called GBM-SSRF (Gradient Boosting Machine with Simplified and Strengthened Random Forest), which cleverly combines the powerful optimization capabilities of the gradient boosting machine (GBM) with improved randomization. The computational efficiency and feature extraction capabilities of the Simplified and Strengthened Random Forest (SSRF) forest significantly improve the performance of financial fraud detection. Although the traditional random forest model has good classification capabilities, it has high computational complexity when faced with large-scale data and has certain limitations in feature selection. As a commonly used ensemble learning method, the GBM model has significant advantages in optimizing performance and handling nonlinear problems. However, GBM takes a long time to train and is prone to overfitting problems when data samples are unbalanced. In response to these limitations, this paper optimizes the random forest based on the structure, reducing the computational complexity and improving the feature selection ability through the structural simplification and enhancement of the random forest. In addition, the optimized random forest is embedded into the GBM framework, and the model can maintain efficiency and stability with the help of GBM’s gradient optimization capability. Experiments show that the GBM-SSRF model not only has good performance, but also has good robustness and generalization capabilities, providing an efficient and reliable solution for financial fraud detection. ...

February 20, 2025 · 2 min · Research Team

Risk-Adjusted Performance of Random Forest Models in High-Frequency Trading

Risk-Adjusted Performance of Random Forest Models in High-Frequency Trading ArXiv ID: 2412.15448 “View on arXiv” Authors: Unknown Abstract Because of the theoretical challenges posed by the Efficient Market Hypothesis to technical analysis, the effectiveness of technical indicators in high-frequency trading remains inadequately explored, particularly at the minute-level frequency, where effects of the microstructure of the market dominate. This study evaluates the integration of traditional technical indicators with random forest regression models using minute-level SPY data, analyzing 13 distinct model configurations. Our empirical results reveal a stark contrast between in-sample and out-of-sample performance, with $R^2$ values deteriorating from 0.749–0.812 during training to negative values in testing. A feature importance analysis demonstrates that primary price-based features dominate the predictions made by the model, accounting for over 60% of the importance, while established technical indicators, such as RSI and Bollinger Bands, account for only 14%–15%. Although the indicator-enhanced models achieved superior risk-adjusted metrics, with Rachev ratios between 0.919 and 0.961, they consistently underperformed a simple buy-and-hold strategy, generating returns ranging from -2.4% to -3.9%. These findings challenge conventional assumptions about the usefulness of technical indicators in algorithmic trading, suggesting that in high-frequency contexts, they may be more relevant to risk management rather than to predicting returns. For practitioners and researchers, our findings indicate that successful high-frequency trading strategies should focus on adaptive feature selection and regime-specific modeling rather than relying on traditional technical indicators, as well as indicating the critical importance of robust out-of-sample testing in the development of a model. ...

December 19, 2024 · 2 min · Research Team

Hunting Tomorrow's Leaders: Using Machine Learning to Forecast S&P 500 Additions & Removal

Hunting Tomorrow’s Leaders: Using Machine Learning to Forecast S&P 500 Additions & Removal ArXiv ID: 2412.12539 “View on arXiv” Authors: Unknown Abstract This study applies machine learning to predict S&P 500 membership changes: key events that profoundly impact investor behavior and market dynamics. Quarterly data from WRDS datasets (2013 onwards) was used, incorporating features such as industry classification, financial data, market data, and corporate governance indicators. Using a Random Forest model, we achieved a test F1 score of 0.85, outperforming logistic regression and SVC models. This research not only showcases the power of machine learning for financial forecasting but also emphasizes model transparency through SHAP analysis and feature engineering. The model’s real world applicability is demonstrated with predicted changes for Q3 2023, such as the addition of Uber (UBER) and the removal of SolarEdge Technologies (SEDG). By incorporating these predictions into a trading strategy i.e. buying stocks announced for addition and shorting those marked for removal, we anticipate capturing alpha and enhancing investment decision making, offering valuable insights into index dynamics ...

December 17, 2024 · 2 min · Research Team

SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest

SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest ArXiv ID: 2410.07143 “View on arXiv” Authors: Unknown Abstract Stock trend forecasting, a challenging problem in the financial domain, involves ex-tensive data and related indicators. Relying solely on empirical analysis often yields unsustainable and ineffective results. Machine learning researchers have demonstrated that the application of random forest algorithm can enhance predictions in this context, playing a crucial auxiliary role in forecasting stock trends. This study introduces a new approach to stock market prediction by integrating sentiment analysis using FinGPT generative AI model with the traditional Random Forest model. The proposed technique aims to optimize the accuracy of stock price forecasts by leveraging the nuanced understanding of financial sentiments provided by FinGPT. We present a new methodology called “Sentiment-Augmented Random Forest” (SARF), which in-corporates sentiment features into the Random Forest framework. Our experiments demonstrate that SARF outperforms conventional Random Forest and LSTM models with an average accuracy improvement of 9.23% and lower prediction errors in pre-dicting stock market movements. ...

September 22, 2024 · 2 min · Research Team