Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

ArXiv ID: 2408.11878 “View on arXiv”

Authors: Unknown

Abstract

Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{“Open-FinLLMs”}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, time-series, and chart data, excelling in zero-shot, few-shot, and fine-tuning settings. The suite includes FinLLaMA, pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs for strong cross-modal reasoning. We comprehensively evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings, introducing two new multimodal evaluation datasets. Our results show that Open-FinLLMs outperforms afvanced financial and general LLMs such as GPT-4, across financial NLP, decision-making, and multi-modal tasks, highlighting their potential to tackle real-world challenges. To foster innovation and collaboration across academia and industry, we release all codes (https://anonymous.4open.science/r/PIXIU2-0D70/B1D7/LICENSE) and models under OSI-approved licenses.

Keywords: Financial LLMs, Multimodal Learning, Open-FinLLMs, Zero-shot Learning, Text-Chart Integration, General Financial Data Analysis

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Street Traders
  • Why: The paper focuses on engineering a suite of open-source LLMs, quantifying performance across extensive tasks and datasets with a lack of advanced mathematical derivations. Its strength lies in heavy implementation, large-scale data curation, and comprehensive benchmarking against real-world models like GPT-4.
  flowchart TD
    Goal["Research Goal<br/>Develop open-source, multimodal LLMs for financial tasks"]
    Data["Data & Corpora<br/>• 52B-token text corpus<br/>• 573K financial instructions<br/>• 1.43M multimodal image-text pairs"]
    Pretrain["Methodology: Pre-training<br/>FinLLaMA model on corpus"]
    Instruct["Methodology: Instruction Tuning<br/>FinLLaMA-Instruct on instructions"]
    Multi["Methodology: Multimodal Tuning<br/>FinLLaVA on image-text pairs"]
    Eval["Evaluation<br/>14 tasks, 30 datasets, 4 multimodal tasks<br/>(Zero/Few/Supervised Shot)"]
    Outcome["Key Findings<br/>Outperforms GPT-4 & financial LLMs<br/>Releases: Models & Code (OSI)"]

    Goal --> Data
    Data --> Pretrain
    Data --> Instruct
    Data --> Multi
    Pretrain --> Instruct
    Instruct --> Multi
    Multi --> Eval
    Eval --> Outcome