Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach

ArXiv ID: 2412.06837 “View on arXiv”

Authors: Unknown

Abstract

This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional Encoder representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction using financial news and the NGX All-Share Index data label. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements. This research highlights global AI advancements in stock markets, showcasing how state-of-the-art language models can contribute to understanding complex financial data. The models were assessed using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results indicate that Logistic Regression outperformed the more computationally intensive FinBERT and predefined approach of versatile GPT-4, with an accuracy of 81.83% and a ROC AUC of 89.76%. The GPT-4 predefined approach exhibited a lower accuracy of 54.19% but demonstrated strong potential in handling complex data. FinBERT, while offering more sophisticated analysis, was resource-demanding and yielded a moderate performance. Hyperparameter optimization using Optuna and cross-validation techniques ensured the robustness of the models. This study highlights the strengths and limitations of the practical applications of AI approaches in stock market prediction and presents Logistic Regression as the most efficient model for this task, with FinBERT and GPT-4 representing emerging tools with potential for future exploration and innovation in AI-driven financial analytics

Keywords: FinBERT, GPT-4, Sentiment Analysis, Logistic Regression, Optuna (Hyperparameter Optimization), Equities (Stock Index)

Complexity vs Empirical Score

  • Math Complexity: 3.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper relies heavily on empirical methods like model training, hyperparameter optimization with Optuna, cross-validation, and standard performance metrics (accuracy, AUC-ROC), but its mathematical depth is limited to basic machine learning concepts and statistical evaluation without advanced derivations or novel theoretical frameworks.
  flowchart TD
    A["Research Goal<br>Compare FinBERT, GPT-4, and Logistic Regression<br>for sentiment analysis & stock price prediction"] --> B["Data Input<br>Financial News & NGX All-Share Index"]

    subgraph C["Computational Processing"]
        direction TB
        C1["FinBERT<br>Resource-Demanding<br>Moderate Performance"]
        C2["GPT-4 Predefined<br>Versatile but Lower Accuracy"]
        C3["Logistic Regression<br>Hyperparameter Optimization via Optuna"]
    end

    B --> C
    C1 --> D["Model Evaluation<br>Accuracy, Precision, Recall, F1, ROC AUC"]
    C2 --> D
    C3 --> D

    D --> E["Key Findings & Outcomes"]
    E --> E1["Logistic Regression<br>Best Performance: 81.83% Acc, 89.76% ROC AUC"]
    E --> E2["GPT-4<br>Low Accuracy (54.19%) but Complex Data Potential"]
    E --> E3["FinBERT<br>Sophisticated but Resource-Intensive"]