Context-Aware Language Models for Forecasting Market Impact from Sequences of Financial News
ArXiv ID: 2509.12519 “View on arXiv”
Authors: Ross Koval, Nicholas Andrews, Xifeng Yan
Abstract
Financial news plays a critical role in the information diffusion process in financial markets and is a known driver of stock prices. However, the information in each news article is not necessarily self-contained, often requiring a broader understanding of the historical news coverage for accurate interpretation. Further, identifying and incorporating the most relevant contextual information presents significant challenges. In this work, we explore the value of historical context in the ability of large language models to understand the market impact of financial news. We find that historical context provides a consistent and significant improvement in performance across methods and time horizons. To this end, we propose an efficient and effective contextualization method that uses a large LM to process the main article, while a small LM encodes the historical context into concise summary embeddings that are then aligned with the large model’s representation space. We explore the behavior of the model through multiple qualitative and quantitative interpretability tests and reveal insights into the value of contextualization. Finally, we demonstrate that the value of historical context in model predictions has real-world applications, translating to substantial improvements in simulated investment performance.
Keywords: large language models, financial news, contextual embeddings, historical context, market impact prediction, Equities
Complexity vs Empirical Score
- Math Complexity: 6.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced language model architectures and alignment techniques with mathematical foundations, while demonstrating strong empirical rigor through backtested investment simulations, dataset curation, and quantifiable performance metrics like AUC.
flowchart TD
A["Research Goal: Assess Value of Historical Context<br/>in LMs for Financial News Market Impact Prediction"] --> B["Data & Inputs: Financial News Articles &<br/>Stock Price Data"]
B --> C["Methodology: Propose Novel Contextualization Model<br/>Small LM encodes history into embeddings<br/>Aligns with Large LM's representation space"]
C --> D["Computational Process: Train & Evaluate Model<br/>Across Multiple Time Horizons"]
D --> E["Key Findings & Outcomes:<br/>1. Historical context yields consistent, significant performance gains<br/>2. Interpretability tests reveal contextual insights<br/>3. Simulated investment performance improves substantially<br/>4. Validated real-world applicability"]