Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis

ArXiv ID: 2402.11728 “View on arXiv”

Authors: Unknown

Abstract

In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. We also demonstrate the practical utility of our proposed model by constructing a novel measure of optimism. Here, we observe the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code are publicly (under CC BY 4.0 license) available on GitHub.

Keywords: Claim Detection, Analyst Reports, Earnings Calls, Weak Supervision, Language Models, Equities

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper is primarily focused on NLP and dataset construction with benchmarking language models and public data release, showing high empirical rigor, while the mathematics involved (statistics for weak-supervision aggregation) is relatively low in complexity.
  flowchart TD
    A["Research Goal: Determine influence of claims<br>on market returns & create claim detection model"] --> B["Create New Financial Dataset"]
    B --> C["Develop Novel Weak-Supervision Model<br>incorporating SME knowledge"]
    C --> D["Benchmark against other<br>Language Models"]
    D --> E["Construct Novel<br>Optimism Measure"]
    E --> F["Analyze Earnings Surprise<br>vs. Optimism"]
    F --> G["Key Findings: Claims impact markets,<br>Proposed model outperforms<br>Optimism correlates with earnings"]