FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools

ArXiv ID: 2501.12399 “View on arXiv”

Authors: Unknown

Abstract

Current financial large language models (FinLLMs) struggle with two critical limitations: the absence of objective evaluation metrics to assess the quality of stock analysis reports and a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights. To address these challenges, this paper introduces FinSphere, a stock analysis agent, along with three major contributions: (1) AnalyScore, a systematic evaluation framework for assessing stock analysis quality, (2) Stocksis, a dataset curated by industry experts to enhance LLMs’ stock analysis capabilities, and (3) FinSphere, an AI agent that can generate high-quality stock analysis reports in response to user queries. Experiments demonstrate that FinSphere achieves superior performance compared to both general and domain-specific LLMs, as well as existing agent-based systems, even when they are enhanced with real-time data access and few-shot guidance. The integrated framework, which combines real-time data feeds, quantitative tools, and an instruction-tuned LLM, yields substantial improvements in both analytical quality and practical applicability for real-world stock analysis.

Keywords: Financial Large Language Models (FinLLMs), AnalyScore, Stocksis Dataset, AI Agent, Stock Analysis, Equities

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 9.0/10
  • Quadrant: Street Traders
  • Why: The paper’s mathematics is minimal, focusing on evaluation metrics and structured frameworks rather than complex formulas or derivations. In contrast, it demonstrates high empirical rigor by introducing a new dataset (Stocksis), detailed evaluation benchmarks (AnalyScore), real-time data integration, and quantitative tool usage, all backed by comparative experiments against general and domain-specific LLMs.
  flowchart TD
    A["Research Goal<br>Address FinLLM limitations:<br>evaluation & depth"] --> B["Methodology & Components"]
    
    B --> C{"Dataset & Evaluation"}
    B --> D{"AI Agent System"}
    
    C --> E["Stocksis Dataset<br>Curated by experts"]
    C --> F["AnalyScore Framework<br>Systematic evaluation metrics"]
    
    D --> G["FinSphere Agent<br>Integration of LLM, tools, & data"]
    D --> H["Real-time Data Feeds<br>& Quantitative Tools"]
    
    E --> G
    F --> G
    H --> G
    
    G --> I["Key Findings & Outcomes"]
    
    subgraph I["Superior Performance"]
        J["Outperforms general LLMs"]
        K["Outperforms domain-specific LLMs"]
        L["Beats agent-based systems<br>even with real-time data & few-shot"]
        M["Higher analytical quality<br>& practical applicability"]
    end
    
    J --> I
    K --> I
    L --> I
    M --> I