MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading
ArXiv ID: 2509.05080 “View on arXiv”
Authors: Yang Chen, Yueheng Jiang, Zhaozhao Ma, Yuchen Cao, Jacky Keung, Kun Kuang, Leilei Gan, Yiquan Wu, Fei Wu
Abstract
The inherent non-stationarity of financial markets and the complexity of multi-modal information pose significant challenges to existing quantitative trading models. Traditional methods relying on fixed structures and unimodal data struggle to adapt to market regime shifts, while large language model (LLM)-driven solutions - despite their multi-modal comprehension - suffer from static strategies and homogeneous expert designs, lacking dynamic adjustment and fine-grained decision mechanisms. To address these limitations, we propose MM-DREX: a Multimodal-driven, Dynamically-Routed EXpert framework based on large language models. MM-DREX explicitly decouples market state perception from strategy execution to enable adaptive sequential decision-making in non-stationary environments. Specifically, it (1) introduces a vision-language model (VLM)-powered dynamic router that jointly analyzes candlestick chart patterns and long-term temporal features to allocate real-time expert weights; (2) designs four heterogeneous trading experts (trend, reversal, breakout, positioning) generating specialized fine-grained sub-strategies; and (3) proposes an SFT-RL hybrid training paradigm to synergistically optimize the router’s market classification capability and experts’ risk-adjusted decision-making. Extensive experiments on multi-modal datasets spanning stocks, futures, and cryptocurrencies demonstrate that MM-DREX significantly outperforms 15 baselines (including state-of-the-art financial LLMs and deep reinforcement learning models) across key metrics: total return, Sharpe ratio, and maximum drawdown, validating its robustness and generalization. Additionally, an interpretability module traces routing logic and expert behavior in real time, providing an audit trail for strategy transparency.
Keywords: Multimodal Large Language Models (LLM), Vision-Language Model (VLM), Dynamic Routing, Reinforcement Learning (RL), Heterogeneous Experts, Multi-Asset (Stocks, Futures, Cryptocurrencies)
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 7.5/10
- Quadrant: Holy Grail
- Why: The paper introduces advanced concepts like POMDP formulation and hybrid SFT-RL training, but the dense mathematics is contained within a high-level framework rather than extensive derivations. It demonstrates strong empirical rigor with a multi-modal dataset, comparisons against 15 baselines, and cross-market validation, indicating backtest-ready implementation.
flowchart TD
A["Research Goal: Overcome market non-stationarity & static LLM limitations in financial trading"] --> B["MM-DREX Framework Architecture"]
B --> C["Multi-Modal Input Processing<br/>(Candlestick Charts & Temporal Data)"]
C --> D["VLM-Powered Dynamic Router<br/>(Real-time expert weight allocation)"]
D --> E["Heterogeneous Expert Pool<br/>1. Trend<br/>2. Reversal<br/>3. Breakout<br/>4. Positioning"]
E --> F["SFT-RL Hybrid Training<br/>(Optimize router & experts jointly)"]
F --> G["Real-time Interpretability<br/>(Routing logic audit trail)"]
F --> H["Key Findings: Outperforms 15 baselines<br/>Higher returns, Sharpe ratio & lower drawdown<br/>across stocks, futures & crypto"]
subgraph "Computational Process"
C
D
E
F
G
end
H