false

Noise-proofing Universal Portfolio Shrinkage

Noise-proofing Universal Portfolio Shrinkage ArXiv ID: 2511.10478 “View on arXiv” Authors: Paul Ruelloux, Christian Bongiorno, Damien Challet Abstract We enhance the Universal Portfolio Shrinkage Approximator (UPSA) of Kelly et al. (2023) by making it more robust with respect to estimation noise and covariate shift. UPSA optimizes the realized Sharpe ratio using a relatively small calibration window, leveraging ridge penalties and cross-validation to yield better portfolios. Yet, it still suffers from the staggering amount of noise in financial data. We propose two methods to make UPSA more robust and improve its efficiency: time-averaging of the optimal penalty weights and using the Average Oracle correlation eigenvalues to make covariance matrices less noisy and more robust to covariate shift. Combining these two long-term averages outperforms UPSA by a large margin in most specifications. ...

November 13, 2025 · 2 min · Research Team

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus ArXiv ID: 2509.13923 “View on arXiv” Authors: Lamia Lamrani, Benoît Collins, Jean-Philippe Bouchaud Abstract Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method. ...

September 17, 2025 · 2 min · Research Team

skfolio: Portfolio Optimization in Python

skfolio: Portfolio Optimization in Python ArXiv ID: 2507.04176 “View on arXiv” Authors: Carlo Nicolini, Matteo Manzi, Hugo Delatte Abstract Portfolio optimization is a fundamental challenge in quantitative finance, requiring robust computational tools that integrate statistical rigor with practical implementation. We present skfolio, an open-source Python library for portfolio construction and risk management that seamlessly integrates with the scikit-learn ecosystem. skfolio provides a unified framework for diverse allocation strategies, from classical mean-variance optimization to modern clustering-based methods, state-of-the-art financial estimators with native interfaces, and advanced cross-validation techniques tailored for financial time series. By adhering to scikit-learn’s fit-predict-transform paradigm, the library enables researchers and practitioners to leverage machine learning workflows for portfolio optimization, promoting reproducibility and transparency in quantitative finance. ...

July 5, 2025 · 2 min · Research Team

Hybrid Models for Financial Forecasting: Combining Econometric, Machine Learning, and Deep Learning Models

Hybrid Models for Financial Forecasting: Combining Econometric, Machine Learning, and Deep Learning Models ArXiv ID: 2505.19617 “View on arXiv” Authors: Dominik Stempień, Robert Ślepaczuk Abstract This research systematically develops and evaluates various hybrid modeling approaches by combining traditional econometric models (ARIMA and ARFIMA models) with machine learning and deep learning techniques (SVM, XGBoost, and LSTM models) to forecast financial time series. The empirical analysis is based on two distinct financial assets: the S&P 500 index and Bitcoin. By incorporating over two decades of daily data for the S&P 500 and almost ten years of Bitcoin data, the study provides a comprehensive evaluation of forecasting methodologies across different market conditions and periods of financial distress. Models’ training and hyperparameter tuning procedure is performed using a novel three-fold dynamic cross-validation method. The applicability of applied models is evaluated using both forecast error metrics and trading performance indicators. The obtained findings indicate that the proper construction process of hybrid models plays a crucial role in developing profitable trading strategies, outperforming their individual components and the benchmark Buy&Hold strategy. The most effective hybrid model architecture was achieved by combining the econometric ARIMA model with either SVM or LSTM, under the assumption of a non-additive relationship between the linear and nonlinear components. ...

May 26, 2025 · 2 min · Research Team

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation ArXiv ID: 2503.15186 “View on arXiv” Authors: Unknown Abstract Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications and a convergence result towards the error of the non linear shrinkage is available in the high-dimensional regime, formal proofs that take into account the finite sample size effects are currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the expected estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. In this framework and in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension which is coherent with the existing theory. ...

March 19, 2025 · 2 min · Research Team

Regularization for electricity price forecasting

Regularization for electricity price forecasting ArXiv ID: 2404.03968 “View on arXiv” Authors: Unknown Abstract The most commonly used form of regularization typically involves defining the penalty function as a L1 or L2 norm. However, numerous alternative approaches remain untested in practical applications. In this study, we apply ten different penalty functions to predict electricity prices and evaluate their performance under two different model structures and in two distinct electricity markets. The study reveals that LQ and elastic net consistently produce more accurate forecasts compared to other regularization types. In particular, they were the only types of penalty functions that consistently produced more accurate forecasts than the most commonly used LASSO. Furthermore, the results suggest that cross-validation outperforms Bayesian information criteria for parameter optimization, and performs as well as models with ex-post parameter selection. ...

April 5, 2024 · 2 min · Research Team

Forecasting the Performance of US Stock Market Indices During COVID-19: RF vs LSTM

Forecasting the Performance of US Stock Market Indices During COVID-19: RF vs LSTM ArXiv ID: 2306.03620 “View on arXiv” Authors: Unknown Abstract The US stock market experienced instability following the recession (2007-2009). COVID-19 poses a significant challenge to US stock traders and investors. Traders and investors should keep up with the stock market. This is to mitigate risks and improve profits by using forecasting models that account for the effects of the pandemic. With consideration of the COVID-19 pandemic after the recession, two machine learning models, including Random Forest and LSTM are used to forecast two major US stock market indices. Data on historical prices after the big recession is used for developing machine learning models and forecasting index returns. To evaluate the model performance during training, cross-validation is used. Additionally, hyperparameter optimizing, regularization, such as dropouts and weight decays, and preprocessing improve the performances of Machine Learning techniques. Using high-accuracy machine learning techniques, traders and investors can forecast stock market behavior, stay ahead of their competition, and improve profitability. Keywords: COVID-19, LSTM, S&P500, Random Forest, Russell 2000, Forecasting, Machine Learning, Time Series JEL Code: C6, C8, G4. ...

June 6, 2023 · 2 min · Research Team