Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach
ArXiv ID: 2408.06634 “View on arXiv”
Authors: Unknown
Abstract
Accurate stock market predictions following earnings reports are crucial for investors. Traditional methods, particularly classical machine learning models, struggle with these predictions because they cannot effectively process and interpret extensive textual data contained in earnings reports and often overlook nuances that influence market movements. This paper introduces an advanced approach by employing Large Language Models (LLMs) instruction fine-tuned with a novel combination of instruction-based techniques and quantized low-rank adaptation (QLoRA) compression. Our methodology integrates ‘base factors’, such as financial metric growth and earnings transcripts, with ’external factors’, including recent market indices performances and analyst grades, to create a rich, supervised dataset. This comprehensive dataset enables our models to achieve superior predictive performance in terms of accuracy, weighted F1, and Matthews correlation coefficient (MCC), especially evident in the comparison with benchmarks such as GPT-4. We specifically highlight the efficacy of the llama-3-8b-Instruct-4bit model, which showcases significant improvements over baseline models. The paper also discusses the potential of expanding the output capabilities to include a ‘Hold’ option and extending the prediction horizon, aiming to accommodate various investment styles and time frames. This study not only demonstrates the power of integrating cutting-edge AI with fine-tuned financial data but also paves the way for future research in enhancing AI-driven financial analysis tools.
Keywords: Large Language Models (LLMs), Instruction Fine-tuning, QLoRA, Earnings Reports, Predictive Performance
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper demonstrates high empirical rigor through detailed data collection, preprocessing, textualization, and comparative evaluation against benchmarks using standard metrics. However, its mathematical complexity is relatively low, focusing on implementing and adapting existing LLM techniques (QLoRA, instruction tuning) rather than deriving novel theoretical models.
flowchart TD
A["Research Goal: Accurate Stock Prediction<br/>from Earnings Reports"] --> B["Data Collection & Integration"]
B --> B1["Base Factors<br/>Financial Metrics, Transcripts"]
B --> B2["External Factors<br/>Market Indices, Analyst Grades"]
B1 & B2 --> C["Model Architecture<br/>Llama-3-8B-Instruct-4bit"]
C --> D["Fine-tuning Strategy<br/>QLoRA + Instruction Tuning"]
D --> E["Model Evaluation"]
E --> F{"Outcomes & Findings"}
F --> F1["Superior Metrics<br/>Accuracy, F1, MCC"]
F --> F2["SOTA Comparison<br/>Outperforms GPT-4"]
F --> F3["Future Directions<br/>Hold Option, Horizon Extension"]