Asset Pricing in Pre-trained Transformer

ArXiv ID: 2505.01575 “View on arXiv”

Authors: Shanyan Lai

Abstract

This paper proposes an innovative Transformer model, Single-directional representative from Transformer (SERT), for US large capital stock pricing. It also innovatively applies the pre-trained Transformer models under the stock pricing and factor investment context. They are compared with standard Transformer models and encoder-only Transformer models in three periods covering the entire COVID-19 pandemic to examine the model adaptivity and suitability during the extreme market fluctuations. Namely, pre-COVID-19 period (mild up-trend), COVID-19 period (sharp up-trend with deep down shock) and 1-year post-COVID-19 (high fluctuation sideways movement). The best proposed SERT model achieves the highest out-of-sample R2, 11.2% and 10.91% respectively, when extreme market fluctuation takes place followed by pre-trained Transformer models (10.38% and 9.15%). Their Trend-following-based strategy wise performance also proves their excellent capability for hedging downside risks during market shocks. The proposed SERT model achieves a Sortino ratio 47% higher than the buy-and-hold benchmark in the equal-weighted portfolio and 28% higher in the value-weighted portfolio when the pandemic period is attended. It proves that Transformer models have a great capability to capture patterns of temporal sparsity data in the asset pricing factor model, especially with considerable volatilities. We also find the softmax signal filter as the common configuration of Transformer models in alternative contexts, which only eliminates differences between models, but does not improve strategy-wise performance, while increasing attention heads improve the model performance insignificantly and applying the ’layer norm first’ method do not boost the model performance in our case.

Keywords: Transformer Model, Asset Pricing, Time Series Prediction, Factor Investing, Deep Learning, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper involves advanced neural network architectures (Transformers, attention mechanisms, pre-training) with mathematical formulations and a novel model (SERT), indicating high math complexity. It is backed by specific empirical evaluation across multiple market regimes (pre-COVID, COVID, post-COVID), out-of-sample metrics (R², Sortino ratios), and portfolio strategy comparisons, demonstrating high empirical rigor.
  flowchart TD
    A["Research Goal<br>Asset Pricing with Transformers"] --> B["Data Collection<br>US Large Cap Stocks, Factor Data"]
    B --> C["Methodology<br>Develop & Compare SERT vs Standard Transformers"]
    C --> D["Model Training<br>Pre-training & Fine-tuning over 3 Market Periods"]
    D --> E["Evaluation & Strategy<br>Out-of-Sample R2 & Trend-Following Portfolio Performance"]
    E --> F{"Key Outcomes"}
    F --> F1["SERT achieves highest R2<br>(11.2% / 10.91%) in volatile periods"]
    F --> F2["Superior hedging & Sortino Ratio<br>(47% / 28% higher) during COVID"]
    F --> F3["Softmax filter & Layer Norm<br>did not boost performance"]