Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?
ArXiv ID: 2505.07078 “View on arXiv”
Authors: Weixian Waylon Li, Hyeonjun Kim, Mihai Cucuringu, Tiejun Ma
Abstract
Large Language Models (LLMs) have recently been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data. However, most evaluations of LLM timing-based investing strategies are conducted on narrow timeframes and limited stock universes, overstating effectiveness due to survivorship and data-snooping biases. We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols. Systematic backtests over two decades and 100+ symbols reveal that previously reported LLM advantages deteriorate significantly under broader cross-section and over a longer-term evaluation. Our market regime analysis further demonstrates that LLM strategies are overly conservative in bull markets, underperforming passive benchmarks, and overly aggressive in bear markets, incurring heavy losses. These findings highlight the need to develop LLM strategies that are able to prioritise trend detection and regime-aware risk controls over mere scaling of framework complexity.
Keywords: Large Language Models (LLMs), Asset Pricing, Survivorship Bias, Market Regime Analysis, Backtesting Framework, Equities
Complexity vs Empirical Score
- Math Complexity: 4.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper presents a rigorous empirical evaluation framework (FINSABER) with extensive backtesting over two decades and 100+ symbols, including bias mitigation and regime analysis, but uses relatively standard statistical and financial metrics without advanced mathematical derivations.
flowchart TD
A["Research Goal:<br/>Can LLM-based Investing<br/>Strategies Outperform<br/>Market in Long Run?"] --> B["Data Input:<br/>Long-term Historical<br/>Market Data<br/>20 Years / 100+ Symbols"]
B --> C["Methodology:<br/>FINSABER Backtesting<br/>Framework<br/>Systematic Evaluation"]
C --> D["Computational Process:<br/>Regime Analysis &<br/>Strategies across<br/>Bull/Bear Markets"]
D --> E{"Key Findings & Outcomes"}
E --> F["LLM Strategies Deteriorate<br/>under broader/longer evaluation"]
E --> G["Conservative in Bull Markets<br/>Underperforms Passive Benchmark"]
E --> H["Aggressive in Bear Markets<br/>Incur Heavy Losses"]
H --> I["Conclusion:<br/>Need for Trend Detection &<br/>Regime-Aware Risk Controls"]