AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
ArXiv ID: 2509.25055 “View on arXiv”
Authors: Binqi Chen, Hongjun Ding, Ning Shen, Jinsheng Huang, Taian Guo, Luchen Liu, Ming Zhang
Abstract
The automated mining of predictive signals, or alphas, is a central challenge in quantitative finance. While Reinforcement Learning (RL) has emerged as a promising paradigm for generating formulaic alphas, existing frameworks are fundamentally hampered by a triad of interconnected issues. First, they suffer from reward sparsity, where meaningful feedback is only available upon the completion of a full formula, leading to inefficient and unstable exploration. Second, they rely on semantically inadequate sequential representations of mathematical expressions, failing to capture the structure that determine an alpha’s behavior. Third, the standard RL objective of maximizing expected returns inherently drives policies towards a single optimal mode, directly contradicting the practical need for a diverse portfolio of non-correlated alphas. To overcome these challenges, we introduce AlphaSAGE (Structure-Aware Alpha Mining via Generative Flow Networks for Robust Exploration), a novel framework is built upon three cornerstone innovations: (1) a structure-aware encoder based on Relational Graph Convolutional Network (RGCN); (2) a new framework with Generative Flow Networks (GFlowNets); and (3) a dense, multi-faceted reward structure. Empirical results demonstrate that AlphaSAGE outperforms existing baselines in mining a more diverse, novel, and highly predictive portfolio of alphas, thereby proposing a new paradigm for automated alpha mining. Our code is available at https://github.com/BerkinChen/AlphaSAGE.
Keywords: generative flow networks (GFlowNets), reinforcement learning (RL), relational graph convolutional network (RGCN), alpha mining, algorithmic trading, Equity
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Holy Grail
- Why: The paper introduces advanced concepts like Relational Graph Convolutional Networks (RGCNs) and Generative Flow Networks (GFlowNets), requiring deep mathematical understanding. Empirical results are backed by real-world data from Chinese and U.S. stock markets, and the code is publicly available on GitHub, indicating high implementation readiness.
flowchart TD
A["Research Goal: Automate mining of predictive,<br>diverse 'alpha' signals in finance"] --> B["Data & Inputs:<br>Equity market data &<br>pre-defined operations"]
B --> C["Problem: Traditional RL fails<br>due to reward sparsity,<br>poor structure, & lack of diversity"]
C --> D["Solution: AlphaSAGE Framework<br>Integrates: RGCN, GFlowNets, &<br>Multi-faceted Rewards"]
D --> E["Process: Generate candidate formulas<br>using Structure-Aware GFlowNets<br>for robust exploration"]
E --> F["Outcome: Robust Alpha Portfolio<br>High diversity, novelty, & predictive<br>power (vs. baselines)"]