Linking microblogging sentiments to stock price movement: An application of GPT-4

ArXiv ID: 2308.16771 “View on arXiv”

Authors: Unknown

Abstract

This paper investigates the potential improvement of the GPT-4 Language Learning Model (LLM) in comparison to BERT for modeling same-day daily stock price movements of Apple and Tesla in 2017, based on sentiment analysis of microblogging messages. We recorded daily adjusted closing prices and translated them into up-down movements. Sentiment for each day was extracted from messages on the Stocktwits platform using both LLMs. We develop a novel method to engineer a comprehensive prompt for contextual sentiment analysis which unlocks the true capabilities of modern LLM. This enables us to carefully retrieve sentiments, perceived advantages or disadvantages, and the relevance towards the analyzed company. Logistic regression is used to evaluate whether the extracted message contents reflect stock price movements. As a result, GPT-4 exhibited substantial accuracy, outperforming BERT in five out of six months and substantially exceeding a naive buy-and-hold strategy, reaching a peak accuracy of 71.47 % in May. The study also highlights the importance of prompt engineering in obtaining desired outputs from GPT-4’s contextual abilities. However, the costs of deploying GPT-4 and the need for fine-tuning prompts highlight some practical considerations for its use.

Keywords: GPT-4, BERT, sentiment analysis, logistic regression, prompt engineering, Equity (Stock)

Complexity vs Empirical Score

  • Math Complexity: 2.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Street Traders
  • Why: The paper employs a straightforward logistic regression model with minimal advanced mathematics, placing it in the low complexity range. However, it demonstrates high empirical rigor through backtesting on real historical data, comparison against benchmarks, and detailed reporting of accuracy metrics, aligning with the Street Traders quadrant.
  flowchart TD
    A["Research Goal<br>Assess GPT-4 vs. BERT for predicting<br>daily stock movements using microblogging sentiment"] --> B["Data Collection & Preparation"]
    B --> C["Sentence Transformation &<br>LLM Prompt Engineering"]
    C --> D["Sentiment Extraction"]
    D --> E{"Compute Daily<br>Sentiment Scores"}
    E --> F["Logistic Regression Modeling"]
    F --> G["Key Findings & Outcomes"]
    
    subgraph B ["Data Collection & Preparation"]
        B1["Stock Prices<br>(Apple & Tesla 2017)"]
        B2["Microblogging Messages<br>(Stocktwits)"]
    end
    
    subgraph D ["Sentiment Extraction"]
        D1["GPT-4 Analysis"]
        D2["BERT Analysis"]
    end
    
    G --> G1["Outcomes"]
    G1 --> G2["GPT-4 outperformed BERT in 5/6 months"]
    G1 --> G3["Peak accuracy: 71.47% (May 2017)"]
    G1 --> G4["Better than Buy-and-Hold strategy"]
    G1 --> G5["Prompt engineering is crucial for performance"]