Language Model Guided Reinforcement Learning in Quantitative Trading
ArXiv ID: 2508.02366 “View on arXiv”
Authors: Adam Darmanin, Vince Vella
Abstract
Algorithmic trading requires short-term tactical decisions consistent with long-term financial objectives. Reinforcement Learning (RL) has been applied to such problems, but adoption is limited by myopic behaviour and opaque policies. Large Language Models (LLMs) offer complementary strategic reasoning and multi-modal signal interpretation when guided by well-structured prompts. This paper proposes a hybrid framework in which LLMs generate high-level trading strategies to guide RL agents. We evaluate (i) the economic rationale of LLM-generated strategies through expert review, and (ii) the performance of LLM-guided agents against unguided RL baselines using Sharpe Ratio (SR) and Maximum Drawdown (MDD). Empirical results indicate that LLM guidance improves both return and risk metrics relative to standard RL.
Keywords: Algorithmic Trading, Reinforcement Learning (RL), Large Language Models (LLMs), Hybrid AI, Trading Strategy
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 6.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced reinforcement learning algorithms, hierarchical prompting frameworks, and regret minimization heuristics for prompt tuning, indicating significant mathematical complexity; it also provides empirical backtesting using standardized financial metrics (Sharpe Ratio, Max Drawdown) on real data, with a structured benchmark environment and reproducible methodology.
flowchart TD
A["Research Goal: Address myopic, opaque RL policies in quantitative trading using LLM guidance"] --> B["Methodology: Hybrid Framework LLM generates high-level strategy guiding RL agent"]
B --> C["Data/Inputs: Multi-modal market signals, financial objectives, expert review criteria"]
C --> D["Computational Process 1: LLM prompt analysis & strategy generation"]
D --> E["Computational Process 2: RL agent learning guided by LLM strategy"]
E --> F["Computational Process 3: Performance evaluation vs. baseline RL"]
F --> G["Key Findings/Outcomes: Expert validation of economic rationale; Improved SR & MDD vs. unguided RL"]