Covariance Matrix Estimation

Squeezed Covariance Matrix Estimation: Analytic Eigenvalue Control

Squeezed Covariance Matrix Estimation: Analytic Eigenvalue Control ArXiv ID: 2512.23021 “View on arXiv” Authors: Layla Abu Khalaf, William Smyth Abstract We revisit Gerber’s Informational Quality (IQ) framework, a data-driven approach for constructing correlation matrices from co-movement evidence, and address two obstacles that limit its use in portfolio optimization: guaranteeing positive semidefinite ness (PSD) and controlling spectral conditioning. We introduce a squeezing identity that represents IQ estimators as a convex-like combination of structured channel matrices, and propose an atomic-IQ parameterization in which each channel-class matrix is built from PSD atoms with a single class-level normalization. This yields constructive PSD guarantees over an explicit feasibility region, avoiding reliance on ex-post projection. To regulate conditioning, we develop an analytic eigen floor that targets either a minimum eigenvalue or a desired condition number and, when necessary, repairs PSD violations in closed form while remaining compatible with the squeezing identity. In long-only tangency back tests with transaction costs, atomic-IQ improves out-of-sample Sharpe ratios and delivers a more stable risk profile relative to a broad set of standard covariance estimators. ...

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus ArXiv ID: 2509.13923 “View on arXiv” Authors: Lamia Lamrani, Benoît Collins, Jean-Philippe Bouchaud Abstract Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method. ...

Investment Portfolio Optimization Based on Modern Portfolio Theory and Deep Learning Models

Investment Portfolio Optimization Based on Modern Portfolio Theory and Deep Learning Models ArXiv ID: 2508.14999 “View on arXiv” Authors: Maciej Wysocki, Paweł Sakowski Abstract This paper investigates an important problem of an appropriate variance-covariance matrix estimation in the Modern Portfolio Theory. We propose a novel framework for variancecovariance matrix estimation for purposes of the portfolio optimization, which is based on deep learning models. We employ the long short-term memory (LSTM) recurrent neural networks (RNN) along with two probabilistic deep learning models: DeepVAR and GPVAR to the task of one-day ahead multivariate forecasting. We then use these forecasts to optimize portfolios of stocks and cryptocurrencies. Our analysis presents results across different combinations of observation windows and rebalancing periods to compare performances of classical and deep learning variance-covariance estimation methods. The conclusions of the study are that although the strategies (portfolios) performance differed significantly between different combinations of parameters, generally the best results in terms of the information ratio and annualized returns are obtained using the LSTM-RNN models. Moreover, longer observation windows translate into better performance of the deep learning models indicating that these methods require longer windows to be able to efficiently capture the long-term dependencies of the variance-covariance matrix structure. Strategies with less frequent rebalancing typically perform better than these with the shortest rebalancing windows across all considered methods. ...

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation ArXiv ID: 2503.15186 “View on arXiv” Authors: Unknown Abstract Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications and a convergence result towards the error of the non linear shrinkage is available in the high-dimensional regime, formal proofs that take into account the finite sample size effects are currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the expected estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. In this framework and in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension which is coherent with the existing theory. ...

High-dimensional covariance matrix estimators on simulated portfolios with complex structures

High-dimensional covariance matrix estimators on simulated portfolios with complex structures ArXiv ID: 2412.08756 “View on arXiv” Authors: Unknown Abstract We study the allocation of synthetic portfolios under hierarchical nested, one-factor, and diagonal structures of the population covariance matrix in a high-dimensional scenario. The noise reduction approaches for the sample realizations are based on random matrices, free probability, deterministic equivalents, and their combination with a data science hierarchical method known as two-step covariance estimators. The financial performance metrics from the simulations are compared with empirical data from companies comprising the S&P 500 index using a moving window and walk-forward analysis. The portfolio allocation strategies analyzed include the minimum variance portfolio (both with and without short-selling constraints) and the hierarchical risk parity approach. Our proposed hierarchical nested covariance model shows signatures of complex system interactions. The empirical financial data reproduces stylized portfolio facts observed in the complex and one-factor covariance models. The two-step estimators proposed here improve several financial metrics under the analyzed investment strategies. The results pave the way for new risk management and diversification approaches when the number of assets is of the same order as the number of transaction days in the investment portfolio. ...

Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series

Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series ArXiv ID: 2407.03781 “View on arXiv” Authors: Unknown Abstract Estimation of high-dimensional covariance matrices in latent factor models is an important topic in many fields and especially in finance. Since the number of financial assets grows while the estimation window length remains of limited size, the often used sample estimator yields noisy estimates which are not even positive definite. Under the assumption of latent factor models, the covariance matrix is decomposed into a common low-rank component and a full-rank idiosyncratic component. In this paper we focus on the estimation of the idiosyncratic component, under the assumption of a grouped structure of the time series, which may arise due to specific factors such as industries, asset classes or countries. We propose a generalized methodology for estimation of the block-diagonal idiosyncratic component by clustering the residual series and applying shrinkage to the obtained blocks in order to ensure positive definiteness. We derive two different estimators based on different clustering methods and test their performance using simulation and historical data. The proposed methods are shown to provide reliable estimates and outperform other state-of-the-art estimators based on thresholding methods. ...