Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon in High-Dimensional Time Series

ArXiv ID: 2407.09738 “View on arXiv”

Authors: Unknown

Abstract

This paper introduces a novel sparse latent factor modeling framework using sparse asymptotic Principal Component Analysis (APCA) to analyze the co-movements of high-dimensional panel data over time. Unlike existing methods based on sparse PCA, which assume sparsity in the loading matrices, our approach posits sparsity in the factor processes while allowing non-sparse loadings. This is motivated by the fact that financial returns typically exhibit universal and non-sparse exposure to market factors. Unlike the commonly used $\ell_1$-relaxation in sparse PCA, the proposed sparse APCA employs a truncated power method to estimate the leading sparse factor and a sequential deflation method for multi-factor cases under $\ell_0$-constraints. Furthermore, we develop a data-driven approach to identify the sparsity of risk factors over the time horizon using a novel cross-sectional cross-validation method. We establish the consistency of our estimators under mild conditions as both the dimension $N$ and the sample size $T$ grow. Monte Carlo simulations demonstrate that the proposed method performs well in finite samples. Empirically, we apply our method to daily S&P 500 stock returns (2004–2016) and identify nine risk factors influencing the stock market.

Keywords: Sparse Asymptotic PCA, Factor Modeling, Co-movements, Cross-Sectional Validation, High-Dimensional Panel Data, Equities

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematics including ℓ₀-constraints, truncated power methods, and asymptotic theory under dependent data, while it demonstrates empirical application with specific datasets and event analysis.
  flowchart TD
    A["Research Goal<br>Sparse Latent Factors in<br>High-Dim Time Series"] --> B["Input Data<br>Panel Data / Time Horizon"]
    B --> C["Methodology<br>Sparse Asymptotic PCA via<br>Truncated Power & Deflation"]
    C --> D{"Cross-Sectional<br>Cross-Validation"}
    D -->|Optimal| E["Computational Process<br>Estimate Sparse Factors &<br>Non-Sparse Loadings"]
    E --> F["Outcomes<br>Consistent Estimators &<br>9 Identified Risk Factors"]