Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning

ArXiv ID: 2509.11420 “View on arXiv”

Authors: Yijia Xiao, Edward Sun, Tong Chen, Fang Wu, Di Luo, Wei Wang

Abstract

Developing professional, structured reasoning on par with human financial analysts and traders remains a central challenge in AI for finance, where markets demand interpretability and trust. Traditional time-series models lack explainability, while LLMs face challenges in turning natural-language analysis into disciplined, executable trades. Although reasoning LLMs have advanced in step-by-step planning and verification, their application to risk-sensitive financial decisions is underexplored. We present Trading-R1, a financially-aware model that incorporates strategic thinking and planning for comprehensive thesis composition, facts-grounded analysis, and volatility-adjusted decision making. Trading-R1 aligns reasoning with trading principles through supervised fine-tuning and reinforcement learning with a three-stage easy-to-hard curriculum. Training uses Tauric-TR1-DB, a 100k-sample corpus spanning 18 months, 14 equities, and five heterogeneous financial data sources. Evaluated on six major equities and ETFs, Trading-R1 demonstrates improved risk-adjusted returns and lower drawdowns compared to both open-source and proprietary instruction-following models as well as reasoning models. The system generates structured, evidence-based investment theses that support disciplined and interpretable trading decisions. Trading-R1 Terminal will be released at https://github.com/TauricResearch/Trading-R1.

Keywords: Algorithmic Trading, Large Language Models (LLMs), Reinforcement Learning, Financial Reasoning, Risk-Adjusted Returns

Complexity vs Empirical Score

  • Math Complexity: 4.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper focuses on LLM fine-tuning and RL techniques, which involve advanced ML methods but minimal heavy mathematical derivations or novel theoretical finance models. It demonstrates empirical rigor through a curated 100k-sample dataset, multi-source heterogeneous data, and evaluation on six equities/ETFs with reported risk-adjusted returns and drawdowns, though it lacks explicit backtest code or detailed statistical metrics in the excerpt.
  flowchart TD
    A["Research Goal: Develop Financially-Aware Reasoning LLM for Trading"] --> B["Data: Tauric-TR1-DB<br>100k samples, 14 equities, 18 months"]
    B --> C["Methodology: 3-Stage RL Curriculum<br>Easy to Hard Reasoning Tasks"]
    C --> D["Training: Supervised Fine-tuning &<br>Reinforcement Learning Alignment"]
    D --> E["Computational Process: Thesis Composition &<br>Volatility-Adjusted Decision Making"]
    E --> F["Outcome: Improved Risk-Adjusted Returns<br>vs. Baselines"]
    F --> G["Result: Structured, Evidence-Based<br>Investment Theses"]
    G --> H["Release: Trading-R1 Terminal"]