Deep Hedging with Reinforcement Learning: A Practical Framework for Option Risk Management
ArXiv ID: 2512.12420 “View on arXiv”
Authors: Travon Lucius, Christian Koch, Jacob Starling, Julia Zhu, Miguel Urena, Carrie Hu
Abstract
We present a reinforcement-learning (RL) framework for dynamic hedging of equity index option exposures under realistic transaction costs and position limits. We hedge a normalized option-implied equity exposure (one unit of underlying delta, offset via SPY) by trading the underlying index ETF, using the option surface and macro variables only as state information and not as a direct pricing engine. Building on the “deep hedging” paradigm of Buehler et al. (2019), we design a leak-free environment, a cost-aware reward function, and a lightweight stochastic actor-critic agent trained on daily end-of-day panel data constructed from SPX/SPY implied volatility term structure, skew, realized volatility, and macro rate context. On a fixed train/validation/test split, the learned policy improves risk-adjusted performance versus no-hedge, momentum, and volatility-targeting baselines (higher point-estimate Sharpe); only the GAE policy’s test-sample Sharpe is statistically distinguishable from zero, although confidence intervals overlap with a long-SPY benchmark so we stop short of claiming formal dominance. Turnover remains controlled and the policy is robust to doubled transaction costs. The modular codebase, comprising a data pipeline, simulator, and training scripts, is engineered for extensibility to multi-asset overlays, alternative objectives (e.g., drawdown or CVaR), and intraday data. From a portfolio management perspective, the learned overlay is designed to sit on top of an existing SPX or SPY allocation, improving the portfolio’s mean-variance trade-off with controlled turnover and drawdowns. We discuss practical implications for portfolio overlays and outline avenues for future work.
Keywords: Deep Hedging, Reinforcement Learning (RL), Option Delta Hedging, Stochastic Actor-Critic, Transaction Costs
Complexity vs Empirical Score
- Math Complexity: 7.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced stochastic optimization and neural network-based reinforcement learning (high math), while providing a fully implemented, deterministic pipeline with extensive backtesting, robustness checks, and public code/data for reproducibility (high rigor).
flowchart TD
A["Research Goal: RL-based dynamic hedging<br>for options under realistic constraints"] --> B["Data & State Construction"]
B --> C["Simulator & Environment Design"]
C --> D["Stochastic Actor-Critic Agent"]
D --> E["Training & Validation"]
E --> F["Testing & Baseline Comparison"]
subgraph Data_Inputs ["Inputs"]
B1["SPX/SPY Implied Vol Surface"]
B2["Macro Rate Context"]
B3["Realized Volatility"]
end
subgraph Processes ["Computational Processes"]
C1["Leak-free environment"]
C2["Cost-aware reward function"]
C3["Transactional constraints"]
end
subgraph Outcomes ["Key Findings"]
F1["Improved Sharpe vs Baselines"]
F2["Robust to 2x transaction costs"]
F3["Controlled turnover & drawdowns"]
end
B --- Data_Inputs
C --- Processes
F --- Outcomes