Towards reducing hallucination in extracting information from financial reports using Large Language Models

ArXiv ID: 2310.10760 “View on arXiv”

Authors: Unknown

Abstract

For a financial analyst, the question and answer (Q&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q&A systems, and empirically demonstrate superiority of our method.

Keywords: Large Language Models, retrieval-augmented generation, financial reporting, Q&A extraction, unstructured data

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper focuses on applying established NLP techniques (LLMs, RAG) rather than deriving new complex mathematics, but it includes a detailed empirical study using a real-world dataset of 50 earnings transcripts and quantitative evaluation metrics, making it highly data/implementation-heavy.
  flowchart TD
    Start["Research Goal: Reduce hallucination in LLM-based financial Q&A extraction"] --> Method["Proposed Method: RAG + Metadata"]

    subgraph Inputs ["Data & Inputs"]
        Input1["Financial Report Transcripts"]
        Input2["Q&A Question Pairs"]
    end

    Inputs --> Process1["Data Preprocessing & Chunking"]
    Process1 --> Process2["Embedding Generation"]
    Process2 --> Process3["Knowledge Base Construction"]

    subgraph Computation ["LLM Processing"]
        User["User Query"] --> RAG["Retrieval-Augmented Generation"]
        Metadata["Financial Metadata Injection"] --> RAG
        Process3 -.-> RAG
        RAG --> LLM["Large Language Model Analysis"]
    end

    LLM --> Output["Extracted Insights & Responses"]

    Output --> Evaluate["Evaluation Phase"]
    Evaluate --> Metrics["Metrics: Accuracy, Relevance, Hallucination Rate"]
    Metrics --> Compare["Comparison: LLM w/ vs w/o Method"]

    Compare --> Result["Result: Superior Performance & Reduced Hallucination"]