Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing

ArXiv ID: 2512.17923 “View on arXiv”

Authors: Christopher Regan, Ying Xie

Abstract

We introduce obfuscation testing, a novel methodology for validating whether large language models detect structural market patterns through causal reasoning rather than temporal association. Testing three dealer hedging constraint patterns (gamma positioning, stock pinning, 0DTE hedging) on 242 trading days (95.6% coverage) of S&P 500 options data, we find LLMs achieve 71.5% detection rate using unbiased prompts that provide only raw gamma exposure values without regime labels or temporal context. The WHO-WHOM-WHAT causal framework forces models to identify the economic actors (dealers), affected parties (directional traders), and structural mechanisms (forced hedging) underlying observed market dynamics. Critically, detection accuracy (91.2%) remains stable even as economic profitability varies quarterly, demonstrating that models identify structural constraints rather than profitable patterns. When prompted with regime labels, detection increases to 100%, but the 71.5% unbiased rate validates genuine pattern recognition. Our findings suggest LLMs possess emergent capabilities for detecting complex financial mechanisms through pure structural reasoning, with implications for systematic strategy development, risk management, and our understanding of how transformer architectures process financial market dynamics.

Keywords: Large Language Models (LLMs), Causal Reasoning, Gamma Exposure, S&P 500 Options, Market Structure, Equities (Options)

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper’s mathematical complexity is low, focusing on causal logic (WHO→WHOM→WHAT) and gamma exposure metrics without advanced stochastic calculus or complex formulas. Empirical rigor is high, evidenced by a large dataset (242 days, 95.6% coverage), robust out-of-sample testing (obfuscation), and measurable performance metrics (71.5% detection rate, 91.2% validation).
  flowchart TD
    A["Research Goal: Validate if LLMs detect structural market patterns<br>via causal reasoning vs. temporal association"] --> B["Method: Obfuscation Testing"]
    B --> C["Data: S&P 500 Options<br>242 Trading Days, 95.6% Coverage"]
    C --> D["Computational Process: WHO-WHOM-WHAT Framework<br>Unbiased Prompts (Raw Gamma Values Only)"]
    D --> E{"Evaluation"}
    E --> F["Key Findings"]
    F --> F1["LLM Detection Rate: 71.5%<br>(Unbiased context)"]
    F --> F2["Detection Accuracy: 91.2%<br>(Stable across economic regimes)"]
    F --> F3["Context Effect: 100%<br>(With regime labels)"]