Integrating Stock Features and Global Information via Large Language Models for Enhanced Stock Return Prediction
ArXiv ID: 2310.05627 “View on arXiv”
Authors: Unknown
Abstract
The remarkable achievements and rapid advancements of Large Language Models (LLMs) such as ChatGPT and GPT-4 have showcased their immense potential in quantitative investment. Traders can effectively leverage these LLMs to analyze financial news and predict stock returns accurately. However, integrating LLMs into existing quantitative models presents two primary challenges: the insufficient utilization of semantic information embedded within LLMs and the difficulties in aligning the latent information within LLMs with pre-existing quantitative stock features. We propose a novel framework consisting of two components to surmount these challenges. The first component, the Local-Global (LG) model, introduces three distinct strategies for modeling global information. These approaches are grounded respectively on stock features, the capabilities of LLMs, and a hybrid method combining the two paradigms. The second component, Self-Correlated Reinforcement Learning (SCRL), focuses on aligning the embeddings of financial news generated by LLMs with stock features within the same semantic space. By implementing our framework, we have demonstrated superior performance in Rank Information Coefficient and returns, particularly compared to models relying only on stock features in the China A-share market.
Keywords: Large Language Models (LLMs), Self-Correlated Reinforcement Learning (SCRL), Local-Global (LG) model, Stock prediction, Semantic embeddings, Equities
Complexity vs Empirical Score
- Math Complexity: 7.0/10
- Empirical Rigor: 5.0/10
- Quadrant: Holy Grail
- Why: The paper introduces advanced mathematical concepts like attention mechanisms, reinforcement learning, and multi-component decomposition within a deep learning framework, scoring high in math complexity. It presents empirical results on the China A-share market with performance metrics like Rank Information Coefficient, indicating backtest-ready methodology, but relies on LLM embeddings without explicit code or dataset sharing, placing it at moderate empirical rigor.
flowchart TD
A["Research Goal<br>Integrate LLMs with Stock Features<br>for Enhanced Return Prediction"] --> B["Data & Inputs<br>China A-share Market<br>Stock Features & Financial News"]
B --> C["Methodology: Local-Global Model<br>1. Stock Feature Strategy<br>2. LLM-based Strategy<br>3. Hybrid Strategy"]
C --> D["Methodology: Self-Correlated Reinforcement Learning<br>Aligns LLM Embeddings with Stock Features<br>in Semantic Space"]
D --> E["Computational Process<br>Framework Training & Validation"]
E --> F["Key Findings/Outcomes<br>Superior Rank IC & Returns<br>vs. Stock Feature-only Models"]