Uni-FinLLM: A Unified Multimodal Large Language Model with Modular Task Heads for Micro-Level Stock Prediction and Macro-Level Systemic Risk Assessment

ArXiv ID: 2601.02677 “View on arXiv”

Authors: Gongao Zhang, Haijiang Zeng, Lu Jiang

Abstract

Financial institutions and regulators require systems that integrate heterogeneous data to assess risks from stock fluctuations to systemic vulnerabilities. Existing approaches often treat these tasks in isolation, failing to capture cross-scale dependencies. We propose Uni-FinLLM, a unified multimodal large language model that uses a shared Transformer backbone and modular task heads to jointly process financial text, numerical time series, fundamentals, and visual data. Through cross-modal attention and multi-task optimization, it learns a coherent representation for micro-, meso-, and macro-level predictions. Evaluated on stock forecasting, credit-risk assessment, and systemic-risk detection, Uni-FinLLM significantly outperforms baselines. It raises stock directional accuracy to 67.4% (from 61.7%), credit-risk accuracy to 84.1% (from 79.6%), and macro early-warning accuracy to 82.3%. Results validate that a unified multimodal LLM can jointly model asset behavior and systemic vulnerabilities, offering a scalable decision-support engine for finance.

Keywords: multimodal large language model, transformer, cross-modal attention, systemic risk detection, time series forecasting, Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Holy Grail
  • Why: The paper presents advanced Transformer architecture and cross-modal fusion with mathematical complexity evident in the model design and optimization, while demonstrating strong empirical rigor through specific performance metrics across multiple datasets and tasks, indicating backtest-ready implementation.
  flowchart TD
    A["Research Goal<br>Unified Multimodal Financial Analysis"] --> B{"Methodology"}
    B --> C["Data Integration<br>Text, Time Series, Fundamentals, Visuals"]
    C --> D["Core Model<br>Uni-FinLLM: Shared Transformer + Modular Heads"]
    D --> E["Processing<br>Cross-Modal Attention & Multi-Task Optimization"]
    E --> F{"Outcomes & Improvements"}
    F --> G["Micro-Level: Stock Directional Accuracy 67.4%"]
    F --> H["Macro-Level: Systemic Risk Accuracy 82.3%"]