Multi-Label Topic Model for Financial Textual Data
ArXiv ID: 2311.07598 “View on arXiv”
Authors: Unknown
Abstract
This paper presents a multi-label topic model for financial texts like ad-hoc announcements, 8-K filings, finance related news or annual reports. I train the model on a new financial multi-label database consisting of 3,044 German ad-hoc announcements that are labeled manually using 20 predefined, economically motivated topics. The best model achieves a macro F1 score of more than 85%. Translating the data results in an English version of the model with similar performance. As application of the model, I investigate differences in stock market reactions across topics. I find evidence for strong positive or negative market reactions for some topics, like announcements of new Large Scale Projects or Bankruptcy Filings, while I do not observe significant price effects for some other topics. Furthermore, in contrast to previous studies, the multi-label structure of the model allows to analyze the effects of co-occurring topics on stock market reactions. For many cases, the reaction to a specific topic depends heavily on the co-occurrence with other topics. For example, if allocated capital from a Seasoned Equity Offering (SEO) is used for restructuring a company in the course of a Bankruptcy Proceeding, the market reacts positively on average. However, if that capital is used for covering unexpected, additional costs from the development of new drugs, the SEO implies negative reactions on average.
Keywords: topic modeling, natural language processing (NLP), market microstructure, event study, text analysis, Equities (Event-Driven)
Complexity vs Empirical Score
- Math Complexity: 4.5/10
- Empirical Rigor: 6.5/10
- Quadrant: Street Traders
- Why: The paper employs standard machine learning models (BERT) with basic statistical metrics (F1 score, Fleiss’ κ) rather than advanced mathematical derivations, but demonstrates strong empirical implementation through manual data labeling, robust model training, and direct application to financial event studies with specific return calculations.
flowchart TD
A["Research Goal<br>How do financial topics<br>and co-occurrences affect<br>stock market reactions?"] --> B["Data Collection<br>3,044 German Ad-hoc<br>Announcements"]
B --> C["Labeling & Corpus<br>Manual labeling with 20<br>economic topics &<br>English translation"]
C --> D["Modeling<br>Multi-Label Topic Model<br>Training & Evaluation"]
D --> E{"Model Performance<br>Macro F1 > 85%?"}
E -->|Yes| F["Analysis<br>Event Study on<br>Market Reactions"]
E -->|No| C
F --> G["Key Findings<br>1. Strong +/- reactions for<br>specific topics<br>e.g., Projects vs. Bankruptcy"]
F --> H["Key Findings<br>2. Co-occurring topics alter<br>reactions e.g., SEO funds<br>Bankruptcy = +,<br>SEO funds R&D = -"]