AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

ArXiv ID: 2508.13174 “View on arXiv”

Authors: Hongjun Ding, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang

Abstract

Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

Keywords: alpha mining, quantitative investment, backtesting, evaluation framework, predictive signals, Equities

Complexity vs Empirical Score

Math Complexity: 4.0/10
Empirical Rigor: 8.5/10
Quadrant: Street Traders
Why: The paper presents a pragmatic framework for alpha evaluation with modest mathematical complexity, but demonstrates high empirical rigor through open-source code, large-scale benchmarks, and specific performance metrics against traditional backtesting.

  flowchart TD
    A["Research Goal: Create AlphaEval<br>to address evaluation challenges in alpha mining"] --> B["Key Methodology: 5-Dimensional<br>Backtest-Free Evaluation Framework"]
    B --> C["Computational Process:<br>Parallelizable Assessment Pipeline"]
    C --> D["Data/Input: Representative Alpha Mining<br>Algorithms & Diverse Datasets"]
    D --> E{"Key Findings & Outcomes"}
    E --> F["✅ Evaluation Consistency<br>~ Backtesting"]
    E --> G["✅ Higher Efficiency &<br>Comprehensive Insights"]
    E --> H["✅ Superior Alpha Identification<br>vs. Single-Metric Screening"]

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining#

Abstract#

Complexity vs Empirical Score#

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

Abstract

Complexity vs Empirical Score