false

XGBoost Forecasting of NEPSE Index Log Returns with Walk Forward Validation

XGBoost Forecasting of NEPSE Index Log Returns with Walk Forward Validation ArXiv ID: 2601.08896 “View on arXiv” Authors: Sahaj Raj Malla, Shreeyash Kayastha, Rumi Suwal, Harish Chandra Bhandari, Rajendra Adhikari Abstract This study develops a robust machine learning framework for one-step-ahead forecasting of daily log-returns in the Nepal Stock Exchange (NEPSE) Index using the XGBoost regressor. A comprehensive feature set is engineered, including lagged log-returns (up to 30 days) and established technical indicators such as short- and medium-term rolling volatility measures and the 14-period Relative Strength Index. Hyperparameter optimization is performed using Optuna with time-series cross-validation on the initial training segment. Out-of-sample performance is rigorously assessed via walk-forward validation under both expanding and fixed-length rolling window schemes across multiple lag configurations, simulating real-world deployment and avoiding lookahead bias. Predictive accuracy is evaluated using root mean squared error, mean absolute error, coefficient of determination (R-squared), and directional accuracy on both log-returns and reconstructed closing prices. Empirical results show that the optimal configuration, an expanding window with 20 lags, outperforms tuned ARIMA and Ridge regression benchmarks, achieving the lowest log-return RMSE (0.013450) and MAE (0.009814) alongside a directional accuracy of 65.15%. While the R-squared remains modest, consistent with the noisy nature of financial returns, primary emphasis is placed on relative error reduction and directional prediction. Feature importance analysis and visual inspection further enhance interpretability. These findings demonstrate the effectiveness of gradient boosting ensembles in modeling nonlinear dynamics in volatile emerging market time series and establish a reproducible benchmark for NEPSE Index forecasting. ...

January 13, 2026 · 3 min · Research Team

Risk-Aware Financial Forecasting Enhanced by Machine Learning and Intuitionistic Fuzzy Multi-Criteria Decision-Making

Risk-Aware Financial Forecasting Enhanced by Machine Learning and Intuitionistic Fuzzy Multi-Criteria Decision-Making ArXiv ID: 2512.17936 “View on arXiv” Authors: Safiye Turgay, Serkan Erdoğan, Željko Stević, Orhan Emre Elma, Tevfik Eren, Zhiyuan Wang, Mahmut Baydaş Abstract In the face of increasing financial uncertainty and market complexity, this study presents a novel risk-aware financial forecasting framework that integrates advanced machine learning techniques with intuitionistic fuzzy multi-criteria decision-making (MCDM). Tailored to the BIST 100 index and validated through a case study of a major defense company in Türkiye, the framework fuses structured financial data, unstructured text data, and macroeconomic indicators to enhance predictive accuracy and robustness. It incorporates a hybrid suite of models, including extreme gradient boosting (XGBoost), long short-term memory (LSTM) network, graph neural network (GNN), to deliver probabilistic forecasts with quantified uncertainty. The empirical results demonstrate high forecasting accuracy, with a net profit mean absolute percentage error (MAPE) of 3.03% and narrow 95% confidence intervals for key financial indicators. The risk-aware analysis indicates a favorable risk-return profile, with a Sharpe ratio of 1.25 and a higher Sortino ratio of 1.80, suggesting relatively low downside volatility and robust performance under market fluctuations. Sensitivity analysis shows that the key financial indicator predictions are highly sensitive to variations of inflation, interest rates, sentiment, and exchange rates. Additionally, using an intuitionistic fuzzy MCDM approach, combining entropy weighting, evaluation based on distance from the average solution (EDAS), and the measurement of alternatives and ranking according to compromise solution (MARCOS) methods, the tabular data learning network (TabNet) outperforms the other models and is identified as the most suitable candidate for deployment. Overall, the findings of this work highlight the importance of integrating advanced machine learning, risk quantification, and fuzzy MCDM methodologies in financial forecasting, particularly in emerging markets. ...

December 11, 2025 · 3 min · Research Team

Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection

Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection ArXiv ID: 2511.17462 “View on arXiv” Authors: Ryan Engel, Yu Chen, Pawel Polak, Ioana Boier Abstract Conditional Autoencoders (CAEs) offer a flexible, interpretable approach for estimating latent asset-pricing factors from firm characteristics. However, existing studies usually limit the latent factor dimension to around K=5 due to concerns that larger K can degrade performance. To overcome this challenge, we propose a scalable framework that couples a high-dimensional CAE with an uncertainty-aware factor selection procedure. We employ three models for quantile prediction: zero-shot Chronos, a pretrained time-series foundation model (ZS-Chronos), gradient-boosted quantile regression trees using XGBoost and RAPIDS (Q-Boost), and an I.I.D bootstrap-based sample mean model (IID-BS). For each model, we rank factors by forecast uncertainty and retain the top-k most predictable factors for portfolio construction, where k denotes the selected subset of factors. This pruning strategy delivers substantial gains in risk-adjusted performance across all forecasting models. Furthermore, due to each model’s uncorrelated predictions, a performance-weighted ensemble consistently outperforms individual models with higher Sharpe, Sortino, and Omega ratios. ...

November 21, 2025 · 2 min · Research Team

CBDC Stress Test in a Dual-Currency Setting

CBDC Stress Test in a Dual-Currency Setting ArXiv ID: 2511.13384 “View on arXiv” Authors: Catalin Dumitrescu Abstract This study explores the potential impact of introducing a Central Bank Digital Currency (CBDC) on financial stability in an emerging dual-currency economy (Romania), where the domestic currency (RON) coexists with the euro. It develops an integrated analytical framework combining econometrics, machine learning, and behavioural modelling. CBDC adoption probabilities are estimated using XGBoost and logistic regression models trained on behavioural and macro-financial indicators rather than survey data. Liquidity stress simulations assess how banks would respond to deposit withdrawals resulting from CBDC adoption, while VAR, MSVAR, and SVAR models capture the macro-financial transmission of liquidity shocks into credit contraction and changes in monetary conditions. The findings indicate that CBDC uptake (co-circulating Digital RON and Digital EUR) would be moderate at issuance, amounting to around EUR 1 billion, primarily driven by digital readiness and trust in the central bank. The study concludes that a non-remunerated, capped CBDC, designed primarily as a means of payment rather than a store of value, can be introduced without compromising financial stability. In dual currency economies, differentiated holding limits for domestic and foreign digital currencies (e.g., Digital RON versus Digital Euro) are crucial to prevent uncontrolled euroisation and preserve monetary sovereignty. A prudent design with moderate caps, non remuneration, and macroprudential coordination can transform CBDC into a digital liquidity buffer and a complementary monetary policy instrument that enhances resilience and inclusion rather than destabilising the financial system. ...

November 17, 2025 · 2 min · Research Team

An extreme Gradient Boosting (XGBoost) Trees approach to Detect and Identify Unlawful Insider Trading (UIT) Transactions

An extreme Gradient Boosting (XGBoost) Trees approach to Detect and Identify Unlawful Insider Trading (UIT) Transactions ArXiv ID: 2511.08306 “View on arXiv” Authors: Krishna Neupane, Igor Griva Abstract Corporate insiders have control of material non-public preferential information (MNPI). Occasionally, the insiders strategically bypass legal and regulatory safeguards to exploit MNPI in their execution of securities trading. Due to a large volume of transactions a detection of unlawful insider trading becomes an arduous task for humans to examine and identify underlying patterns from the insider’s behavior. On the other hand, innovative machine learning architectures have shown promising results for analyzing large-scale and complex data with hidden patterns. One such popular technique is eXtreme Gradient Boosting (XGBoost), the state-of-the-arts supervised classifier. We, hence, resort to and apply XGBoost to alleviate challenges of identification and detection of unlawful activities. The results demonstrate that XGBoost can identify unlawful transactions with a high accuracy of 97 percent and can provide ranking of the features that play the most important role in detecting fraudulent activities. ...

November 11, 2025 · 2 min · Research Team

Forecasting Liquidity Withdraw with Machine Learning Models

Forecasting Liquidity Withdraw with Machine Learning Models ArXiv ID: 2509.22985 “View on arXiv” Authors: Haochuan, Wang Abstract Liquidity withdrawal is a critical indicator of market fragility. In this project, I test a framework for forecasting liquidity withdrawal at the individual-stock level, ranging from less liquid stocks to highly liquid large-cap tickers, and evaluate the relative performance of competing model classes in predicting short-horizon order book stress. We introduce the Liquidity Withdrawal Index (LWI) – defined as the ratio of order cancellations to the sum of standing depth and new additions at the best quotes – as a bounded, interpretable measure of transient liquidity removal. Using Nasdaq market-by-order (MBO) data, we compare a spectrum of approaches: linear benchmarks (AR, HAR), and non-linear tree ensembles (XGBoost), across horizons ranging from 250,ms to 5,s. Beyond predictive accuracy, our results provide insights into order placement and cancellation dynamics, identify regimes where linear versus non-linear signals dominate, and highlight how early-warning indicators of liquidity withdrawal can inform both market surveillance and execution. ...

September 26, 2025 · 2 min · Research Team

Improving S&P 500 Volatility Forecasting through Regime-Switching Methods

Improving S&P 500 Volatility Forecasting through Regime-Switching Methods ArXiv ID: 2510.03236 “View on arXiv” Authors: Ava C. Blake, Nivika A. Gandhi, Anurag R. Jakkula Abstract Accurate prediction of financial market volatility is critical for risk management, derivatives pricing, and investment strategy. In this study, we propose a multitude of regime-switching methods to improve the prediction of S&P 500 volatility by capturing structural changes in the market across time. We use eleven years of SPX data, from May 1st, 2014 to May 27th, 2025, to compute daily realized volatility (RV) from 5-minute intraday log returns, adjusted for irregular trading days. To enhance forecast accuracy, we engineered features to capture both historical dynamics and forward-looking market sentiment across regimes. The regime-switching methods include a soft Markov switching algorithm to estimate soft-regime probabilities, a distributional spectral clustering method that uses XGBoost to assign clusters at prediction time, and a coefficient-based soft regime algorithm that extracts HAR coefficients from time segments segmented through the Mood test and clusters through Bayesian GMM for soft regime weights, using XGBoost to predict regime probabilities. Models were evaluated across three time periods–before, during, and after the COVID-19 pandemic. The coefficient-based clustering algorithm outperformed all other models, including the baseline autoregressive model, during all time periods. Additionally, each model was evaluated on its recursive forecasting performance for 5- and 10-day horizons during each time period. The findings of this study demonstrate the value of regime-aware modeling frameworks and soft clustering approaches in improving volatility forecasting, especially during periods of heightened uncertainty and structural change. ...

September 21, 2025 · 2 min · Research Team

Benchmarking Classical and Quantum Models for DeFi Yield Prediction on Curve Finance

Benchmarking Classical and Quantum Models for DeFi Yield Prediction on Curve Finance ArXiv ID: 2508.02685 “View on arXiv” Authors: Chi-Sheng Chen, Aidan Hung-Wen Tsai Abstract The rise of decentralized finance (DeFi) has created a growing demand for accurate yield and performance forecasting to guide liquidity allocation strategies. In this study, we benchmark six models, XGBoost, Random Forest, LSTM, Transformer, quantum neural networks (QNN), and quantum support vector machines with quantum feature maps (QSVM-QNN), on one year of historical data from 28 Curve Finance pools. We evaluate model performance on test MAE, RMSE, and directional accuracy. Our results show that classical ensemble models, particularly XGBoost and Random Forest, consistently outperform both deep learning and quantum models. XGBoost achieves the highest directional accuracy (71.57%) with a test MAE of 1.80, while Random Forest attains the lowest test MAE of 1.77 and 71.36% accuracy. In contrast, quantum models underperform with directional accuracy below 50% and higher errors, highlighting current limitations in applying quantum machine learning to real-world DeFi time series data. This work offers a reproducible benchmark and practical insights into model suitability for DeFi applications, emphasizing the robustness of classical methods over emerging quantum approaches in this domain. ...

July 22, 2025 · 2 min · Research Team

Can We Reliably Predict the Fed's Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting

Can We Reliably Predict the Fed’s Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting ArXiv ID: 2506.22763 “View on arXiv” Authors: Fiona Xiao Jingyi, Lili Liu Abstract Forecasting central bank policy decisions remains a persistent challenge for investors, financial institutions, and policymakers due to the wide-reaching impact of monetary actions. In particular, anticipating shifts in the U.S. federal funds rate is vital for risk management and trading strategies. Traditional methods relying only on structured macroeconomic indicators often fall short in capturing the forward-looking cues embedded in central bank communications. This study examines whether predictive accuracy can be enhanced by integrating structured data with unstructured textual signals from Federal Reserve communications. We adopt a multi-modal framework, comparing traditional machine learning models, transformer-based language models, and deep learning architectures in both unimodal and hybrid settings. Our results show that hybrid models consistently outperform unimodal baselines. The best performance is achieved by combining TF-IDF features of FOMC texts with economic indicators in an XGBoost classifier, reaching a test AUC of 0.83. FinBERT-based sentiment features marginally improve ranking but perform worse in classification, especially under class imbalance. SHAP analysis reveals that sparse, interpretable features align more closely with policy-relevant signals. These findings underscore the importance of integrating textual and structured signals transparently. For monetary policy forecasting, simpler hybrid models can offer both accuracy and interpretability, delivering actionable insights for researchers and decision-makers. ...

June 28, 2025 · 2 min · Research Team

Hybrid Models for Financial Forecasting: Combining Econometric, Machine Learning, and Deep Learning Models

Hybrid Models for Financial Forecasting: Combining Econometric, Machine Learning, and Deep Learning Models ArXiv ID: 2505.19617 “View on arXiv” Authors: Dominik Stempień, Robert Ślepaczuk Abstract This research systematically develops and evaluates various hybrid modeling approaches by combining traditional econometric models (ARIMA and ARFIMA models) with machine learning and deep learning techniques (SVM, XGBoost, and LSTM models) to forecast financial time series. The empirical analysis is based on two distinct financial assets: the S&P 500 index and Bitcoin. By incorporating over two decades of daily data for the S&P 500 and almost ten years of Bitcoin data, the study provides a comprehensive evaluation of forecasting methodologies across different market conditions and periods of financial distress. Models’ training and hyperparameter tuning procedure is performed using a novel three-fold dynamic cross-validation method. The applicability of applied models is evaluated using both forecast error metrics and trading performance indicators. The obtained findings indicate that the proper construction process of hybrid models plays a crucial role in developing profitable trading strategies, outperforming their individual components and the benchmark Buy&Hold strategy. The most effective hybrid model architecture was achieved by combining the econometric ARIMA model with either SVM or LSTM, under the assumption of a non-additive relationship between the linear and nonlinear components. ...

May 26, 2025 · 2 min · Research Team