false

Law-Strength Frontiers and a No-Free-Lunch Result for Law-Seeking Reinforcement Learning on Volatility Law Manifolds

Law-Strength Frontiers and a No-Free-Lunch Result for Law-Seeking Reinforcement Learning on Volatility Law Manifolds ArXiv ID: 2511.17304 “View on arXiv” Authors: Jian’an Zhang Abstract We study reinforcement learning (RL) on volatility surfaces through the lens of Scientific AI. We ask whether axiomatic no-arbitrage laws, imposed as soft penalties on a learned world model, can reliably align high-capacity RL agents, or mainly create Goodhart-style incentives to exploit model errors. From classical static no-arbitrage conditions we build a finite-dimensional convex volatility law manifold of admissible total-variance surfaces, together with a metric law-penalty functional and a Graceful Failure Index (GFI) that normalizes law degradation under shocks. A synthetic generator produces law-consistent trajectories, while a recurrent neural world model trained without law regularization exhibits structured off-manifold errors. On this testbed we define a Goodhart decomposition (r = r^{"\mathcal{M"}} + r^\perp), where (r^\perp) is ghost arbitrage from off-manifold prediction error. We prove a ghost-arbitrage incentive theorem for PPO-type agents, a law-strength trade-off theorem showing that stronger penalties eventually worsen P&L, and a no-free-lunch theorem: under a law-consistent world model and law-aligned strategy class, unconstrained law-seeking RL cannot Pareto-dominate structural baselines on P&L, penalties, and GFI. In experiments on an SPX/VIX-like world model, simple structural strategies form the empirical law-strength frontier, while all law-seeking RL variants underperform and move into high-penalty, high-GFI regions. Volatility thus provides a concrete case where reward shaping with verifiable penalties is insufficient for robust law alignment. ...

November 21, 2025 · 2 min · Research Team

Deep Learning Option Pricing with Market Implied Volatility Surfaces

Deep Learning Option Pricing with Market Implied Volatility Surfaces ArXiv ID: 2509.05911 “View on arXiv” Authors: Lijie Ding, Egang Lu, Kin Cheung Abstract We present a deep learning framework for pricing options based on market-implied volatility surfaces. Using end-of-day S&P 500 index options quotes from 2018-2023, we construct arbitrage-free volatility surfaces and generate training data for American puts and arithmetic Asian options using QuantLib. To address the high dimensionality of volatility surfaces, we employ a variational autoencoder (VAE) that compresses volatility surfaces across maturities and strikes into a 10-dimensional latent representation. We feed these latent variables, combined with option-specific inputs such as strike and maturity, into a multilayer perceptron to predict option prices. Our model is trained in stages: first to train the VAE for volatility surface compression and reconstruction, then options pricing mapping, and finally fine-tune the entire network end-to-end. The trained pricer achieves high accuracy across American and Asian options, with prediction errors concentrated primarily near long maturities and at-the-money strikes, where absolute bid-ask price differences are known to be large. Our method offers an efficient and scalable approach requiring only a single neural network forward pass and naturally improve with additional data. By bridging volatility surface modeling and option pricing in a unified framework, it provides a fast and flexible alternative to traditional numerical approaches for exotic options. ...

September 7, 2025 · 2 min · Research Team

Enhancing Deep Hedging of Options with Implied Volatility Surface Feedback Information

Enhancing Deep Hedging of Options with Implied Volatility Surface Feedback Information ArXiv ID: 2407.21138 “View on arXiv” Authors: Unknown Abstract We present a dynamic hedging scheme for S&P 500 options, where rebalancing decisions are enhanced by integrating information about the implied volatility surface dynamics. The optimal hedging strategy is obtained through a deep policy gradient-type reinforcement learning algorithm. The favorable inclusion of forward-looking information embedded in the volatility surface allows our procedure to outperform several conventional benchmarks such as practitioner and smiled-implied delta hedging procedures, both in simulation and backtesting experiments. The outperformance is more pronounced in the presence of transaction costs. ...

July 30, 2024 · 2 min · Research Team