Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning
ArXiv ID: 2512.23515 “View on arXiv”
Authors: Zuoyou Jiang, Li Zhao, Rui Sun, Ruohan Sun, Zhongjian Li, Jing Li, Daxin Jiang, Zuo Bai, Cheng Hua
Abstract
Signal decay and regime shifts pose recurring challenges for data-driven investment strategies in non-stationary markets. Conventional time-series and machine learning approaches, which rely primarily on historical correlations, often struggle to generalize when the economic environment changes. While large language models (LLMs) offer strong capabilities for processing unstructured information, their potential to support quantitative factor screening through explicit economic reasoning remains underexplored. Existing factor-based methods typically reduce alphas to numerical time series, overlooking the semantic rationale that determines when a factor is economically relevant. We propose Alpha-R1, an 8B-parameter reasoning model trained via reinforcement learning for context-aware alpha screening. Alpha-R1 reasons over factor logic and real-time news to evaluate alpha relevance under changing market conditions, selectively activating or deactivating factors based on contextual consistency. Empirical results across multiple asset pools show that Alpha-R1 consistently outperforms benchmark strategies and exhibits improved robustness to alpha decay. The full implementation and resources are available at https://github.com/FinStep-AI/Alpha-R1.
Keywords: Reinforcement Learning, Large Language Models (LLMs), Alpha Decay, Factor Screening, Regime Shifts, Equities
Complexity vs Empirical Score
- Math Complexity: 6.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper integrates reinforcement learning and advanced LLM reasoning with a substantial mathematical foundation, including statistical concepts for non-stationary markets and RL optimization. Empirically, it presents extensive backtesting across multiple asset pools, provides a public GitHub repository with implementation, and reports robustness to alpha decay, indicating high practical applicability.
flowchart TD
A["Research Goal<br>Address Signal Decay & Regime Shifts"] --> B["Data Inputs<br>Market Data, Unstructured News, Factor Series"]
B --> C["Methodology<br>Alpha-R1 8B Model with RL Training"]
C --> D["Process<br>Reasoning: Factor Logic + News Context"]
D --> E["Outcome<br>Context-Aware Alpha Screening<br>(Activate/Deactivate Factors)"]
E --> F["Key Findings<br>Outperforms Benchmarks, Robust to Decay"]