FinGPT: Open-Source Financial Large Language Models

ArXiv ID: 2306.06031 “View on arXiv”

Authors: Unknown

Abstract

Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality financial data is the first challenge for financial LLMs (FinLLMs). While proprietary models like BloombergGPT have taken advantage of their unique data accumulation, such privileged access calls for an open-source alternative to democratize Internet-scale financial data. In this paper, we present an open-source large language model, FinGPT, for the finance sector. Unlike proprietary models, FinGPT takes a data-centric approach, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. We highlight the importance of an automatic data curation pipeline and the lightweight low-rank adaptation technique in building FinGPT. Furthermore, we showcase several potential applications as stepping stones for users, such as robo-advising, algorithmic trading, and low-code development. Through collaborative efforts within the open-source AI4Finance community, FinGPT aims to stimulate innovation, democratize FinLLMs, and unlock new opportunities in open finance. Two associated code repos are https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP

Keywords: Large Language Models, Natural Language Processing, Data Curation, Low-Rank Adaptation, Open Source AI, Cross-Asset (General Finance)

Complexity vs Empirical Score

  • Math Complexity: 3.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper focuses on a practical, end-to-end framework with extensive implementation details, code repositories, and real-world applications like robo-advising and algorithmic trading, indicating high empirical rigor. However, the mathematical content is relatively light, primarily citing existing LLM architectures like Transformers and LoRA without deep derivations or novel complex mathematics.
  flowchart TD
    A["Research Goal: Democratize FinLLMs with an Open-Source Alternative"] --> B["Data-Centric Approach"]
    B --> C["Automatic Data Curation Pipeline"]
    C --> D{"Inputs: Internet-Scale<br>Financial Data"}
    D --> E["Low-Rank Adaptation<br>LoRA Technique"]
    E --> F["Computational Process:<br>Lightweight Finetuning"]
    F --> G["Key Outcomes:<br>FinGPT Model"]
    G --> H{"Applications:<br>Robo-Advising, Trading, Low-Code"}