SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection

ArXiv ID: 2505.14420 “View on arXiv”

Authors: Huopu Zhang, Yanguang Liu, Miao Zhang, Zirui He, Mengnan Du

Abstract

Predicting earnings surprises from financial documents, such as earnings conference calls, regulatory filings, and financial news, has become increasingly important in financial economics. However, these financial documents present significant analytical challenges, typically containing over 5,000 words with substantial redundancy and industry-specific terminology that creates obstacles for language models. In this work, we propose the SAE-FiRE (Sparse Autoencoder for Financial Representation Enhancement) framework to address these limitations by extracting key information while eliminating redundancy. SAE-FiRE employs Sparse Autoencoders (SAEs) to decompose dense neural representations from large language models into interpretable sparse components, then applies statistical feature selection methods, including ANOVA F-tests and tree-based importance scoring, to identify the top-k most discriminative dimensions for classification. By systematically filtering out noise that might otherwise lead to overfitting, we enable more robust and generalizable predictions. Experimental results across three financial datasets demonstrate that SAE-FiRE significantly outperforms baseline approaches.

Keywords: earnings surprise prediction, sparse autoencoder, financial representation enhancement, feature selection, language models, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced neural network architectures and mathematical formulations (e.g., sparse autoencoders, gradient boosting), placing it in the high-math category. Its empirical rigor is solid, with evaluation on three real financial datasets and cross-validation, though it lacks explicit code or live backtesting details.
  flowchart TD
    A["Research Goal: Predict Earnings Surprises from Financial Documents<br/>(Conference Calls, Filings, News)"] --> B["Input Data<br/>Textual Datasets"]
    B --> C["Core Methodology: SAE-FiRE Framework"]
    C --> D["1. Representation Extraction<br/>Use LLM to generate dense text embeddings"]
    D --> E["2. Sparse Decomposition<br/>Apply Sparse Autoencoder to get interpretable features"]
    E --> F["3. Feature Selection<br/>ANOVA F-tests & Tree-based importance scoring<br/>Select Top-k discriminative features"]
    F --> G["Outcomes & Findings"]
    G --> H["Robust Predictions<br/>Reduced overfitting via noise filtering"]
    G --> I["Performance Boost<br/>Significantly outperforms baseline approaches"]