CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

ArXiv ID: 2411.16666 “View on arXiv”

Authors: Unknown

Abstract

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear or temporal correlations among features, we also propose a new kernel-based independence measure. CatNet performs robustly on different model settings with both simulated and real-world data, which reduces overfitting and improves interpretability of the model. Our framework that introduces SHAP for feature importance in FDR control algorithms and improves Gaussian Mirror can be naturally extended to other time-series or sequential deep learning models.

Keywords: LSTM, False Discovery Rate (FDR), SHAP, Feature Selection, Interpretability

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced statistical concepts (FDR control, Gaussian Mirrors, SHAP derivatives) with significant mathematical formulations, yet also demonstrates strong empirical testing on both simulated and real-world financial data (S&P 500 prediction) with robust backtesting setups.
  flowchart TD
    A["Research Goal<br>Control FDR & Improve<br>Interpretability in LSTM"] --> B["Data & Inputs<br>Simulated & Real-World<br>Time-Series Data"]
    B --> C["Methodology: Feature Importance<br>Compute SHAP Derivatives<br>Quantify Temporal Influence"]
    C --> D["Methodology: FDR Control<br>Construct Gaussian Mirror<br>Statistics with Kernel Independence"]
    D --> E["Computational Process<br>Apply Adaptive Thresholding<br>to Select Features"]
    E --> F["Key Findings<br>Reduced Overfitting<br>Robust FDR Control<br>Enhanced Interpretability"]