Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism

ArXiv ID: 2406.06594 “View on arXiv”

Authors: Unknown

Abstract

The accurate prediction of stock movements is crucial for investment strategies. Stock prices are subject to the influence of various forms of information, including financial indicators, sentiment analysis, news documents, and relational structures. Predominant analytical approaches, however, tend to address only unimodal or bimodal sources, neglecting the complexity of multimodal data. Further complicating the landscape are the issues of data sparsity and semantic conflicts between these modalities, which are frequently overlooked by current models, leading to unstable performance and limiting practical applicability. To address these shortcomings, this study introduces a novel architecture, named Multimodal Stable Fusion with Gated Cross-Attention (MSGCA), designed to robustly integrate multimodal input for stock movement prediction. The MSGCA framework consists of three integral components: (1) a trimodal encoding module, responsible for processing indicator sequences, dynamic documents, and a relational graph, and standardizing their feature representations; (2) a cross-feature fusion module, where primary and consistent features guide the multimodal fusion of the three modalities via a pair of gated cross-attention networks; and (3) a prediction module, which refines the fused features through temporal and dimensional reduction to execute precise movement forecasting. Empirical evaluations demonstrate that the MSGCA framework exceeds current leading methods, achieving performance gains of 8.1%, 6.1%, 21.7% and 31.6% on four multimodal datasets, respectively, attributed to its enhanced multimodal fusion stability.

Keywords: Multimodal Fusion, Cross-Attention Networks, Sentiment Analysis, Relational Graphs, Stock Movement Prediction, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper involves advanced deep learning architectures (Gated Cross-Attention, GAT, LLMs) and complex multimodal fusion mathematics, while demonstrating strong empirical validation with specific performance metrics (8.1%, 6.1%, etc. gains) across four real multimodal datasets, indicating backtest-ready methodology.
  flowchart TD
    A["Research Goal:<br>Predict Stock Movements<br>from Multimodal Data"] --> B["Key Data Inputs"]
    
    subgraph B [" "]
        B1["Financial Indicators"]
        B2["News & Social Media"]
        B3["Relational Graphs"]
    end
    
    B --> C["MSGCA Architecture"]
    
    subgraph C ["Computational Process"]
        C1["Trimodal Encoding<br>Feature Standardization"]
        C2["Cross-Feature Fusion<br>Gated Cross-Attention"]
        C3["Prediction Module<br>Temporal Reduction"]
    end
    
    C1 --> C2 --> C3
    
    C3 --> D["Key Findings"]
    
    subgraph D ["Outcomes"]
        D1["SOTA Performance<br>+8.1% to +31.6% improvement"]
        D2["Enhanced Stability<br>Solves semantic conflicts"]
        D3["Robust Fusion<br>Overcomes data sparsity"]
    end