The Memorization Problem: Can We Trust LLMs’ Economic Forecasts?
ArXiv ID: 2504.14765 “View on arXiv”
Authors: Unknown
Abstract
Large language models (LLMs) cannot be trusted for economic forecasts during periods covered by their training data. Counterfactual forecasting ability is non-identified when the model has seen the realized values: any observed output is consistent with both genuine skill and memorization. Any evidence of memorization represents only a lower bound on encoded knowledge. We demonstrate LLMs have memorized economic and financial data, recalling exact values before their knowledge cutoff. Instructions to respect historical boundaries fail to prevent recall-level accuracy, and masking fails as LLMs reconstruct entities and dates from minimal context. Post-cutoff, we observe no recall. Memorization extends to embeddings.
Keywords: Large Language Models, Memorization, Counterfactual Forecasting, Economic Data, Data Privacy, Macroeconomics / General
Complexity vs Empirical Score
- Math Complexity: 3.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Street Traders
- Why: The paper’s theoretical component is primarily conceptual, focusing on non-identification and logical arguments, which is moderately complex; however, it is heavily grounded in systematic empirical testing with extensive data analysis, evaluation metrics, and specific findings on recall accuracy and masking failures.
flowchart TD
Start["Research Goal: Can LLMs be trusted for economic forecasts?"] --> Method["Methodology: Test recall vs. genuine skill"]
Method --> Data["Data: Historical economic/financial series"]
Data --> Proc["Computational Process: Prompt recall & counterfactual tests"]
Proc --> Out1["Finding 1: Models memorize exact values pre-cutoff"]
Proc --> Out2["Finding 2: Instructions/masking fail to prevent recall"]
Proc --> Out3["Finding 3: Post-cutoff forecasts show no recall"]
Out1 --> Result["Outcome: Forecast trust is impossible pre-cutoff"]
Out2 --> Result
Out3 --> Result