false

High-Dimensional Spatial Arbitrage Pricing Theory with Heterogeneous Interactions

High-Dimensional Spatial Arbitrage Pricing Theory with Heterogeneous Interactions ArXiv ID: 2511.01271 “View on arXiv” Authors: Zhaoxing Gao, Sihan Tu, Ruey S. Tsay Abstract This paper investigates estimation and inference of a Spatial Arbitrage Pricing Theory (SAPT) model that integrates spatial interactions with multi-factor analysis, accommodating both observable and latent factors. Building on the classical mean-variance analysis, we introduce a class of Spatial Capital Asset Pricing Models (SCAPM) that account for spatial effects in high-dimensional assets, where we define {"\it spatial rho"} as a counterpart to market beta in CAPM. We then extend SCAPM to a general SAPT framework under a {"\it complete"} market setting by incorporating multiple factors. For SAPT with observable factors, we propose a generalized shrinkage Yule-Walker (SYW) estimation method that integrates ridge regression to estimate spatial and factor coefficients. When factors are latent, we first apply an autocovariance-based eigenanalysis to extract factors, then employ the SYW method using the estimated factors. We establish asymptotic properties for these estimators under high-dimensional settings where both the dimension and sample size diverge. Finally, we use simulated and real data examples to demonstrate the efficacy and usefulness of the proposed model and method. ...

November 3, 2025 · 2 min · Research Team

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus ArXiv ID: 2509.13923 “View on arXiv” Authors: Lamia Lamrani, Benoît Collins, Jean-Philippe Bouchaud Abstract Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method. ...

September 17, 2025 · 2 min · Research Team

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation ArXiv ID: 2503.15186 “View on arXiv” Authors: Unknown Abstract Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications and a convergence result towards the error of the non linear shrinkage is available in the high-dimensional regime, formal proofs that take into account the finite sample size effects are currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the expected estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. In this framework and in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension which is coherent with the existing theory. ...

March 19, 2025 · 2 min · Research Team

When can weak latent factors be statistically inferred?

When can weak latent factors be statistically inferred? ArXiv ID: 2407.03616 “View on arXiv” Authors: Unknown Abstract This article establishes a new and comprehensive estimation and inference theory for principal component analysis (PCA) under the weak factor model that allow for cross-sectional dependent idiosyncratic components under the nearly minimal factor strength relative to the noise level or signal-to-noise ratio. Our theory is applicable regardless of the relative growth rate between the cross-sectional dimension $N$ and temporal dimension $T$. This more realistic assumption and noticeable result require completely new technical device, as the commonly-used leave-one-out trick is no longer applicable to the case with cross-sectional dependence. Another notable advancement of our theory is on PCA inference $ - $ for example, under the regime where $N\asymp T$, we show that the asymptotic normality for the PCA-based estimator holds as long as the signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$. This finding significantly surpasses prior work that required a polynomial rate of $N$. Our theory is entirely non-asymptotic, offering finite-sample characterizations for both the estimation error and the uncertainty level of statistical inference. A notable technical innovation is our closed-form first-order approximation of PCA-based estimator, which paves the way for various statistical tests. Furthermore, we apply our theories to design easy-to-implement statistics for validating whether given factors fall in the linear spans of unknown latent factors, testing structural breaks in the factor loadings for an individual unit, checking whether two units have the same risk exposures, and constructing confidence intervals for systematic risks. Our empirical studies uncover insightful correlations between our test results and economic cycles. ...

July 4, 2024 · 2 min · Research Team

FDR-Controlled Portfolio Optimization for Sparse Financial Index Tracking

FDR-Controlled Portfolio Optimization for Sparse Financial Index Tracking ArXiv ID: 2401.15139 “View on arXiv” Authors: Unknown Abstract In high-dimensional data analysis, such as financial index tracking or biomedical applications, it is crucial to select the few relevant variables while maintaining control over the false discovery rate (FDR). In these applications, strong dependencies often exist among the variables (e.g., stock returns), which can undermine the FDR control property of existing methods like the model-X knockoff method or the T-Rex selector. To address this issue, we have expanded the T-Rex framework to accommodate overlapping groups of highly correlated variables. This is achieved by integrating a nearest neighbors penalization mechanism into the framework, which provably controls the FDR at the user-defined target level. A real-world example of sparse index tracking demonstrates the proposed method’s ability to accurately track the S&P 500 index over the past 20 years based on a small number of stocks. An open-source implementation is provided within the R package TRexSelector on CRAN. ...

January 26, 2024 · 2 min · Research Team