Generating long-horizon stock “buy” signals with a neural language model

ArXiv ID: 2410.18988 “View on arXiv”

Authors: Unknown

Abstract

This paper describes experiments on fine-tuning a small language model to generate forecasts of long-horizon stock price movements. Inputs to the model are narrative text from 10-K reports of large market capitalization companies in the S&P 500 index; the output is a forward-looking buy or sell decision. Price direction is predicted at discrete horizons up to 12 months after the report filing date. The results reported here demonstrate good out-of-sample statistical performance (F1-macro= 0.62) at medium to long investment horizons. In particular, the buy signals generated from 10-K text are found most precise at 6 and 9 months in the future. As measured by the F1 score, the buy signal provides between 4.8 and 9 percent improvement against a random stock selection model. In contrast, sell signals generated by the models do not perform well. This may be attributed to the highly imbalanced out-of-sample data, or perhaps due to management drafting annual reports with a bias toward positive language. Cross-sectional analysis of performance by economic sector suggests that idiosyncratic reporting styles within industries are correlated with varying degrees and time scales of price movement predictability.

Keywords: fine-tuning, language model, 10-K reports, stock price forecasting, F1 score, Equities

Complexity vs Empirical Score

  • Math Complexity: 3.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper employs standard NLP/ML techniques (fine-tuning a transformer model) with relatively simple math, but demonstrates strong empirical rigor through detailed backtesting methodology, out-of-sample cross-validation, and reported performance metrics (F1 scores) on real financial data.
  flowchart TD
    A["Research Goal: Forecast long-horizon stock price moves<br>using NLP on 10-K reports"] --> B["Data Input:<br>10-K Reports from S&P 500"]
    B --> C["Preprocessing & Tokenization"]
    C --> D["Model: Small Language Model<br>Fine-tuned for Buy/Sell Prediction"]
    D --> E["Forecast Horizons: 3, 6, 9, 12 Months"]
    E --> F["Outcomes"]
    F --> G["Buy Signals: High Precision<br>F1-macro = 0.62<br>Best at 6 & 9 months"]
    F --> H["Sell Signals: Low Performance<br>Attributed to Data Imbalance<br>and Management Bias"]