Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification

ArXiv ID: 2511.02469 “View on arXiv”

Authors: Kaito Takano, Masanori Hirano, Kei Nakagawa

Abstract

Accurately forecasting central bank policy decisions, particularly those of the Federal Open Market Committee(FOMC) has become increasingly important amid heightened economic uncertainty. While prior studies have used monetary policy texts to predict rate changes, most rely on static classification models that overlook the deliberative nature of policymaking. This study proposes a novel framework that structurally imitates the FOMC’s collective decision-making process by modeling multiple large language models(LLMs) as interacting agents. Each agent begins with a distinct initial belief and produces a prediction based on both qualitative policy texts and quantitative macroeconomic indicators. Through iterative rounds, agents revise their predictions by observing the outputs of others, simulating deliberation and consensus formation. To enhance interpretability, we introduce a latent variable representing each agent’s underlying belief(e.g., hawkish or dovish), and we theoretically demonstrate how this belief mediates the perception of input information and interaction dynamics. Empirical results show that this debate-based approach significantly outperforms standard LLMs-based baselines in prediction accuracy. Furthermore, the explicit modeling of beliefs provides insights into how individual perspectives and social influence shape collective policy forecasts.

Keywords: Central bank policy, Large language models, Multi-agent systems, Forecasting, Macroeconomics, Rates / Macro

Complexity vs Empirical Score

  • Math Complexity: 3.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper employs complex LLM-based multi-agent architectures and latent variable modeling, but the mathematical framework is primarily conceptual and probabilistic rather than dense with advanced derivations. Empirical evaluation is robust, involving backtested prediction accuracy against baselines and ablation studies on real FOMC/Beige Book data, making it highly implementation-ready.
  flowchart TD
    A["Research Goal<br>Predict FOMC Monetary Policy<br>via Simulated Debate"] --> B["Inputs<br>Economic Reports & Policy Texts"]
    B --> C["Agent Initialization<br>Assign Hawkish/Dovish Latent Beliefs"]
    C --> D{"Multi-Agent Debate Loop<br>Iterative Prediction & Consensus"}
    D -- Rounds of Interaction --> D
    D --> E["Decision Classification<br>Final Rate Change Prediction"]
    E --> F["Outcomes<br>Higher Accuracy<br>Interpretable Belief Dynamics"]