AI in Investment Analysis: LLMs for Equity Stock Ratings

ArXiv ID: 2411.00856 “View on arXiv”

Authors: Unknown

Abstract

Investment Analysis is a cornerstone of the Financial Services industry. The rapid integration of advanced machine learning techniques, particularly Large Language Models (LLMs), offers opportunities to enhance the equity rating process. This paper explores the application of LLMs to generate multi-horizon stock ratings by ingesting diverse datasets. Traditional stock rating methods rely heavily on the expertise of financial analysts, and face several challenges such as data overload, inconsistencies in filings, and delayed reactions to market events. Our study addresses these issues by leveraging LLMs to improve the accuracy and consistency of stock ratings. Additionally, we assess the efficacy of using different data modalities with LLMs for the financial domain. We utilize varied datasets comprising fundamental financial, market, and news data from January 2022 to June 2024, along with GPT-4-32k (v0613) (with a training cutoff in Sep. 2021 to prevent information leakage). Our results show that our benchmark method outperforms traditional stock rating methods when assessed by forward returns, specially when incorporating financial fundamentals. While integrating news data improves short-term performance, substituting detailed news summaries with sentiment scores reduces token use without loss of performance. In many cases, omitting news data entirely enhances performance by reducing bias. Our research shows that LLMs can be leveraged to effectively utilize large amounts of multimodal financial data, as showcased by their effectiveness at the stock rating prediction task. Our work provides a reproducible and efficient framework for generating accurate stock ratings, serving as a cost-effective alternative to traditional methods. Future work will extend to longer timeframes, incorporate diverse data, and utilize newer models for enhanced insights.

Keywords: Large Language Models (LLMs), Stock Ratings, Investment Analysis, Fundamental Analysis, GPT-4

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper focuses on applying off-the-shelf LLMs to financial data with minimal novel mathematical derivations, but includes a detailed empirical study using real market data over multiple years with robust backtesting on forward returns.
  flowchart TD
    A["Research Goal<br/>Apply LLMs for Equity Stock Ratings"] --> B
    
    subgraph B ["Key Methodology"]
        direction LR
        B1["Data Collection<br/>Jan 2022 - Jun 2024"] --> B2["LLM Processing<br/>GPT-4-32k"]
    end

    B --> C["Data Inputs"]
    
    subgraph C ["Multimodal Datasets"]
        C1["Fundamental Data"]
        C2["Market Data"]
        C3["News/Sentiment"]
    end

    C --> D["Computational Process<br/>Multi-Horizon Stock Rating Generation"]

    D --> E["Key Findings & Outcomes"]
    
    subgraph E ["Results"]
        direction LR
        E1["↑ Accuracy vs Traditional Methods<br/>Especially with Fundamentals"]
        E2["News aids short-term<br/>but increases bias"]
        E3["Sentiment scores<br/>reduce token usage"]
        E4["Reproducible &<br/>Cost-effective Framework"]
    end