Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
ArXiv ID: 2509.24254 “View on arXiv”
Authors: Yuntao Wu, Ege Mert Akin, Charles Martineau, Vincent Grégoire, Andreas Veneris
Abstract
We examine how textual features in earnings press releases predict stock returns on earnings announcement days. Using over 138,000 press releases from 2005 to 2023, we compare traditional bag-of-words and BERT-based embeddings. We find that press release content (soft information) is as informative as earnings surprise (hard information), with FinBERT yielding the highest predictive power. Combining models enhances explanatory strength and interpretability of the content of press releases. Stock prices fully reflect the content of press releases at market open. If press releases are leaked, it offers predictive advantage. Topic analysis reveals self-serving bias in managerial narratives. Our framework supports real-time return prediction through the integration of online learning, provides interpretability and reveals the nuanced role of language in price formation.
Keywords: natural language processing (NLP), BERT embeddings, earnings announcements, event study, textual analysis, Equity
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The paper uses standard NLP models (BERT, LDA) and regression without heavy mathematical derivations, but is exceptionally data-intensive with a massive dataset (138k+ press releases, 134k+ observations), extensive preprocessing, and a validated trading strategy showing predictive power.
flowchart TD
A["Research Goal:<br>Predict Earnings Announcement Returns<br>from Press Release Text"] --> B["Data Input:<br>138K+ Press Releases (2005-2023)"]
B --> C["Methodology:<br>Compare BOW vs. BERT/FinBERT Embeddings"]
C --> D["Analysis:<br>Event Study &<br>Online Learning Models"]
D --> E{"Key Findings"}
E --> F1["Press Release Content<br>as informative as Hard Numbers"]
E --> F2["FinBERT yields highest predictive power"]
E --> F3["Prices reflect info at market open<br>Leakage offers advantage"]
E --> F4["Topic Analysis reveals<br>Managerial Self-Serving Bias"]