MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading

ArXiv ID: 2507.20474 “View on arXiv”

Authors: Siyi Wu, Junqiao Wang, Zhaoyang Guan, Leyi Zhao, Xinyuan Song, Xinyu Ying, Dexu Yu, Jinhao Wang, Hanlin Zhang, Michele Pak, Yangfan He, Yi Xin, Jianhui Wang, Tianyu Shi

Abstract

Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{“MountainLion”}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence.

Keywords: Large Language Models (LLMs), Multi-modal agents, Reinforcement learning, Candlestick analysis, Macro-economic signals

Complexity vs Empirical Score

  • Math Complexity: 3.0/10
  • Empirical Rigor: 2.5/10
  • Quadrant: Philosophers
  • Why: The paper focuses on an applied LLM-agent system architecture and multi-modal data integration rather than deep mathematical theory, resulting in low math complexity. While it describes system components and ablation studies, it lacks detailed empirical data, backtest performance metrics, or reproducible code/datasets, giving it low empirical rigor.
  flowchart TD
    A["Research Goal:<br/>Interpretable & Adaptive Financial Trading"] --> B["Multi-Modal Data Inputs<br/>News / Candlesticks / Signals"]
    B --> C["MountainLion Agent System<br/>Coordinated LLM Agents"]
    C --> D["Central Reflection Module<br/>Historical Analysis & Refinement"]
    D --> C
    C --> E{"Dynamic Adjustment &<br/>User Interaction"}
    E --> C
    C --> F["Investment Strategies &<br/>Financial Reports"]
    F --> G["Key Outcomes:<br/>Improved Returns &<br/>Investor Confidence"]