Optimizing Performance: How Compact Models Match or Exceed GPT’s Classification Capabilities through Fine-Tuning

ArXiv ID: 2409.11408 “View on arXiv”

Authors: Unknown

Abstract

In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet’s Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification.

Keywords: Sentiment Analysis, Fine-tuning, FinBERT, Financial News, Large Language Models (LLMs), Equities / General Financial Markets

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper introduces a novel market-based dataset, fine-tunes multiple models, and makes models publicly available, indicating significant empirical rigor. While it references the Condorcet Jury Theorem for conceptual analysis, the core contribution is methodological and data-driven rather than advancing complex mathematical theory.
  flowchart TD
    A["Research Goal: Assess fine-tuned small models vs. GPT in financial sentiment analysis"] --> B["Create Novel Dataset<br>Bloomberg News with<br>objective market scores"]
    B --> C["Methodology: Fine-tuning<br>FinBERT, FinDRoBERTa, GPT-3.5"]
    C --> D["Comparative Analysis<br>Zero-shot vs. Fine-tuned performance"]
    D --> E{"Key Findings & Outcomes"}
    E --> F["Fine-tuned small models outperform<br>zero-shot GPT-3.5/4"]
    E --> G["Small models achieve<br>comparable results to fine-tuned GPT"]
    E --> H["Models publicly available on<br>HuggingFace"]
    E --> I["Condorcet's Jury Theorem assumptions<br>do not hold<br>Models show behavioral similarities"]