FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

ArXiv ID: 2502.11433 “View on arXiv”

Authors: Unknown

Abstract

Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{“FLAG-Trader”}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.

Keywords: Large Language Models (LLMs), Reinforcement Learning (RL), policy gradient optimization, trading agent, Financial Markets

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical concepts such as Markov Decision Processes, policy gradients, and parameter-efficient fine-tuning, but also presents extensive empirical evidence, including backtested trading performance metrics and comparisons with baseline strategies.
  flowchart TD
    A["Research Goal: Improve LLM Decision-Making for Multi-Step Financial Trading"] --> B["Data Input: Multimodal Financial Data"]
    B --> C["Methodology: FLAG-Trader Architecture"]
    C --> D["LLM as Policy Network<br/>Parameter-Efficient Fine-Tuning"]
    C --> E["Gradient-Based RL<br/>Policy Gradient Optimization"]
    D --> F["Computational Process:<br/>Trading Reward Signal"]
    E --> F
    F --> G["Key Findings/Outcomes:<br/>Enhanced Trading Performance<br/>Improved General Financial Reasoning"]