EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models
ArXiv ID: 2507.17211 “View on arXiv”
Authors: Haochen Luo, Yuan Zhang, Chen Liu
Abstract
Sparse portfolio optimization is a fundamental yet challenging problem in quantitative finance, since traditional approaches heavily relying on historical return statistics and static objectives can hardly adapt to dynamic market regimes. To address this issue, we propose Evolutionary Factor Search (EFS), a novel framework that leverages large language models (LLMs) to automate the generation and evolution of alpha factors for sparse portfolio construction. By reformulating the asset selection problem as a top-m ranking task guided by LLM-generated factors, EFS incorporates an evolutionary feedback loop to iteratively refine the factor pool based on performance. Extensive experiments on five Fama-French benchmark datasets and three real-market datasets (US50, HSI45 and CSI300) demonstrate that EFS significantly outperforms both statistical-based and optimization-based baselines, especially in larger asset universes and volatile conditions. Comprehensive ablation studies validate the importance of prompt composition, factor diversity, and LLM backend choice. Our results highlight the promise of language-guided evolution as a robust and interpretable paradigm for portfolio optimization under structural constraints.
Keywords: Large Language Models (LLMs), Evolutionary Factor Search, Sparse Portfolio Optimization, Alpha Factor Generation, Top-m Ranking, Equity (Multi-Asset)
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper introduces a novel evolutionary framework combining LLM generation with rigorous backtesting on multiple real-market datasets, though the mathematical depth is moderate (combining ranking theory with evolutionary algorithms) rather than exceptionally advanced.
flowchart TD
A["Research Goal: Enhance Sparse Portfolio Optimization with LLM-Generated Factors"] --> B["Data Collection & Setup"]
B --> C["Core Methodology: Evolutionary Factor Search EFS"]
subgraph C ["EFS Framework"]
C1["LLM Prompt Engineering<br>Initial Factor Generation"]
C2["Factor Evaluation<br>Top-m Ranking Performance"]
C3["Evolutionary Loop<br>Selection & Mutation"]
end
B --> D["Baselines & Benchmarks<br>Fama-French + Market Datasets"]
C --> E["Computational Process<br>Iterative Refinement"]
E --> F["Key Findings & Outcomes"]
subgraph F ["Results"]
F1["Outperforms Statistical & Optimization Baselines"]
F2["Scales Well with Large Asset Universes"]
F3["Robust in Volatile Market Conditions"]
F4["Validated via Ablation Studies"]
end