Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading

ArXiv ID: 2508.07408 “View on arXiv”

Authors: Yueyi Wang, Qiyao Wei

Abstract

In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use an LLM to automatically assign multi-label event categories to high-sentiment-intensity tweets. We align these labeled sentiment signals with forward returns over 1-to-7-day horizons to evaluate their statistical efficacy and market tradability. Our experiments reveal that certain event labels consistently yield negative alpha, with Sharpe ratios as low as -0.38 and information coefficients exceeding 0.05, all statistically significant at the 95% confidence level. This study establishes the feasibility of transforming unstructured social media text into structured, multi-label event variables. A key contribution of this work is its commitment to transparency and reproducibility; all code and methodologies are made publicly available. Our results provide compelling evidence that social media sentiment is a valuable, albeit noisy, signal in financial forecasting and underscore the potential of open-source frameworks to democratize algorithmic trading research.

Keywords: Large Language Models (LLMs), Alpha Signal Discovery, Sentiment Analysis, Event Classification, Social Media Data, Equities

Complexity vs Empirical Score

  • Math Complexity: 3.5/10
  • Empirical Rigor: 7.2/10
  • Quadrant: Street Traders
  • Why: The paper focuses on practical implementation of LLM-based feature engineering with extensive backtesting metrics (Sharpe ratios, information coefficients) and transparency, but its mathematical depth is moderate, relying primarily on statistical testing rather than advanced derivations.
  flowchart TD
    A["Research Goal:<br/>Extract Event-Aware Sentiment<br/>Alpha from Financial Tweets"] --> B["Input: Company-Related<br/>Twitter Data Corpus"]
    B --> C["LLM-Augmented<br/>Semantic Annotation<br/>Assign Multi-Label Events"]
    C --> D["Signal Alignment:<br/>Link Sentiment to<br/>1-7 Day Forward Returns"]
    D --> E{"Statistical &<br/>Tradability Evaluation"}
    E -->|Quantitative Metrics| F["Key Findings/Outcomes"]
    F --> G["Identified Negative Alpha<br/>Event Labels (Sharpe ~ -0.38)"]
    F --> H["Transparent Framework<br/>Code & Methods Publicly Available"]