Risk-Sensitive Option Market Making with Arbitrage-Free eSSVI Surfaces: A Constrained RL and Stochastic Control Bridge
ArXiv ID: 2510.04569 “View on arXiv”
Authors: Jian’an Zhang
Abstract
We formulate option market making as a constrained, risk-sensitive control problem that unifies execution, hedging, and arbitrage-free implied-volatility surfaces inside a single learning loop. A fully differentiable eSSVI layer enforces static no-arbitrage conditions (butterfly and calendar) while the policy controls half-spreads, hedge intensity, and structured surface deformations (state-dependent rho-shift and psi-scale). Executions are intensity-driven and respond monotonically to spreads and relative mispricing; tail risk is shaped with a differentiable CVaR objective via the Rockafellar–Uryasev program. We provide theory for (i) grid-consistency and rates for butterfly/calendar surrogates, (ii) a primal–dual grounding of a learnable dual action acting as a state-dependent Lagrange multiplier, (iii) differentiable CVaR estimators with mixed pathwise and likelihood-ratio gradients and epi-convergence to the nonsmooth objective, (iv) an eSSVI wing-growth bound aligned with Lee’s moment constraints, and (v) policy-gradient validity under smooth surrogates. In simulation (Heston fallback; ABIDES-ready), the agent attains positive adjusted P&L on most intraday segments while keeping calendar violations at numerical zero and butterfly violations at the numerical floor; ex-post tails remain realistic and can be tuned through the CVaR weight. The five control heads admit clear economic semantics and analytic sensitivities, yielding a white-box learner that unifies pricing consistency and execution control in a reproducible pipeline.
Keywords: Option market making, eSSVI, Risk-sensitive control, CVaR optimization, Implied volatility surface, Options
Complexity vs Empirical Score
- Math Complexity: 9.2/10
- Empirical Rigor: 7.8/10
- Quadrant: Holy Grail
- Why: The paper presents a highly advanced mathematical framework integrating stochastic control, constrained reinforcement learning, and differential geometry of arbitrage-free eSSVI surfaces, with rigorous theoretical proofs (e.g., convergence rates, primal-dual theory, CVaR gradient estimators). Empirically, it uses a sophisticated agent-based simulator (ABIDES-ready) with Heston fallback, reporting concrete metrics like P&L, arbitrage violations, and tunable tail risk, indicating substantial implementation and backtesting effort.
flowchart TD
A["Research Goal: Unify execution, hedging, & arbitrage-free surfaces in risk-sensitive market making"] --> B{"Key Methodology: Constrained RL & Stochastic Control Bridge"}
B --> C["Policy Control Heads<br/>Half-spreads, Hedge intensity, Surface deformations"]
B --> D["eSSVI Layer<br/>Enforces no-arbitrage constraints<br/>(Butterfly & Calendar)"]
B --> E["Objective Function<br/>Differentiable CVaR via Rockafellar-Uryasev"]
C --> F["Computational Process<br/>Constrained Policy Optimization<br/>Primal-Dual & Gradient Flow"]
D --> F
E --> F
F --> G["Simulations: Heston Fallback & ABIDES"]
G --> H["Key Outcomes:<br/>Positive P&L, Zero Calendar Violations, Tunable Tail Risk"]