Beyond the Black Box: Interpretability of LLMs in Finance

ArXiv ID: 2505.24650 “View on arXiv”

Authors: Hariom Tatsat, Ariye Shater

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities across a spectrum of tasks in financial services, including report generation, chatbots, sentiment analysis, regulatory compliance, investment advisory, financial knowledge retrieval, and summarization. However, their intrinsic complexity and lack of transparency pose significant challenges, especially in the highly regulated financial sector, where interpretability, fairness, and accountability are critical. As far as we are aware, this paper presents the first application in the finance domain of understanding and utilizing the inner workings of LLMs through mechanistic interpretability, addressing the pressing need for transparency and control in AI systems. Mechanistic interpretability is the most intuitive and transparent way to understand LLM behavior by reverse-engineering their internal workings. By dissecting the activations and circuits within these models, it provides insights into how specific features or components influence predictions - making it possible not only to observe but also to modify model behavior. In this paper, we explore the theoretical aspects of mechanistic interpretability and demonstrate its practical relevance through a range of financial use cases and experiments, including applications in trading strategies, sentiment analysis, bias, and hallucination detection. While not yet widely adopted, mechanistic interpretability is expected to become increasingly vital as adoption of LLMs increases. Advanced interpretability tools can ensure AI systems remain ethical, transparent, and aligned with evolving financial regulations. In this paper, we have put special emphasis on how these techniques can help unlock interpretability requirements for regulatory and compliance purposes - addressing both current needs and anticipating future expectations from financial regulators globally.

Keywords: Mechanistic Interpretability, Large Language Models (LLMs), AI Transparency, Regulatory Compliance, Trading Strategy Analysis, Multi-Asset

Complexity vs Empirical Score

Math Complexity: 7.0/10
Empirical Rigor: 4.0/10
Quadrant: Lab Rats
Why: The paper introduces advanced machine learning concepts like mechanistic interpretability, sparse autoencoders, and circuit analysis, which involve dense mathematical foundations. However, it focuses on theoretical exploration and conceptual financial use cases (e.g., sentiment analysis, bias detection) without presenting code, specific backtests, datasets, or statistical validation metrics, indicating a lack of empirical implementation depth.

  flowchart TD
    A["Research Goal<br>Application of Mechanistic<br>Interpretability in Finance"] --> B["Methodology<br>Theoretical Framework + Experiments"]
    B --> C["Computational Process<br>Reverse-engineering LLM<br>Activations & Circuits"]
    C --> D{"Key Financial Use Cases"}
    D --> E["Trading Strategy Analysis"]
    D --> F["Sentiment & Bias Detection"]
    D --> G["Hallucination Detection"]
    E & F & G --> H["Key Outcomes<br>Enhanced Transparency &<br>Regulatory Compliance"]

Beyond the Black Box: Interpretability of LLMs in Finance#

Abstract#

Complexity vs Empirical Score#

Beyond the Black Box: Interpretability of LLMs in Finance

Abstract

Complexity vs Empirical Score