Sentiment trading with large language models

ArXiv ID: 2412.19245 “View on arXiv”

Authors: Unknown

Abstract

We investigate the efficacy of large language models (LLMs) in sentiment analysis of U.S. financial news and their potential in predicting stock market returns. We analyze a dataset comprising 965,375 news articles that span from January 1, 2010, to June 30, 2023; we focus on the performance of various LLMs, including BERT, OPT, FINBERT, and the traditional Loughran-McDonald dictionary model, which has been a dominant methodology in the finance literature. The study documents a significant association between LLM scores and subsequent daily stock returns. Specifically, OPT, which is a GPT-3 based LLM, shows the highest accuracy in sentiment prediction with an accuracy of 74.4%, slightly ahead of BERT (72.5%) and FINBERT (72.2%). In contrast, the Loughran-McDonald dictionary model demonstrates considerably lower effectiveness with only 50.1% accuracy. Regression analyses highlight a robust positive impact of OPT model scores on next-day stock returns, with coefficients of 0.274 and 0.254 in different model specifications. BERT and FINBERT also exhibit predictive relevance, though to a lesser extent. Notably, we do not observe a significant relationship between the Loughran-McDonald dictionary model scores and stock returns, challenging the efficacy of this traditional method in the current financial context. In portfolio performance, the long-short OPT strategy excels with a Sharpe ratio of 3.05, compared to 2.11 for BERT and 2.07 for FINBERT long-short strategies. Strategies based on the Loughran-McDonald dictionary yield the lowest Sharpe ratio of 1.23. Our findings emphasize the superior performance of advanced LLMs, especially OPT, in financial market prediction and portfolio management, marking a significant shift in the landscape of financial analysis tools with implications to financial regulation and policy analysis.

Keywords: Sentiment analysis, Large Language Models (LLMs), Financial news, Stock return prediction, BERT/OPT

Complexity vs Empirical Score

  • Math Complexity: 6.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs sophisticated NLP and econometric models (BERT, OPT, linear regressions with fixed effects) and advanced statistical techniques, indicating high mathematical complexity. It demonstrates strong empirical rigor by using a large dataset of nearly a million news articles, reporting precise metrics like accuracy (74.4%), Sharpe ratios (3.05), and transaction costs, and outlining a backtest-ready trading strategy.
  flowchart TD
    A["Research Goal<br>Test LLM efficacy in<br>financial sentiment analysis"] --> B["Data Input<br>965,375 news articles<br>2010-2023"]
    
    B --> C["Methodology<br>Compare LLMs: BERT, OPT, FINBERT<br>vs Traditional Dictionary"]
    
    C --> D["Computational Process<br>Sentiment Scoring &<br>Regression Analysis"]
    
    D --> E["Key Outcomes"]
    
    E --> F["Optimal Model<br>OPT (GPT-3 based)<br>74.4% accuracy"]
    
    E --> G["Portfolio Performance<br>OPT Long-Short Strategy<br>Sharpe Ratio: 3.05"]
    
    E --> H["Traditional Model Failure<br>LM Dictionary<br>50.1% accuracy, No Returns Link"]