Empirical Models of the Time Evolution of SPX Option Prices

ArXiv ID: 2506.17511 “View on arXiv”

Authors: Alessio Brini, David A. Hsieh, Patrick Kuiper, Sean Moushegian, David Ye

Abstract

The key objective of this paper is to develop an empirical model for pricing SPX options that can be simulated over future paths of the SPX. To accomplish this, we formulate and rigorously evaluate several statistical models, including neural network, random forest, and linear regression. These models use the observed characteristics of the options as inputs – their price, moneyness and time-to-maturity, as well as a small set of external inputs, such as the SPX and its past history, dividend yield, and the risk-free rate. Model evaluation is performed on historical options data, spanning 30 years of daily observations. Significant effort is given to understanding the data and ensuring explainability for the neural network. A neural network model with two hidden layers and four neurons per layer, trained with minimal hyperparameter tuning, performs well against the theoretical Black-Scholes-Merton model for European options, as well as two other empirical models based on the random forest and the linear regression. It delivers arbitrage-free option prices without requiring these conditions to be imposed.

Keywords: SPX Options Pricing, Neural Networks, Random Forest, Arbitrage-Free Pricing, Empirical Models, Derivatives

Complexity vs Empirical Score

  • Math Complexity: 5.5/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Holy Grail
  • Why: The paper combines advanced statistical methods like GARCH and neural networks with a rigorous evaluation on 30 years of historical options data, showing practical backtest readiness. While the math includes stochastic modeling and machine learning, the heavy empirical focus on real-world data validation places it in the Holy Grail quadrant.
  flowchart TD
    A["Research Goal: Develop empirical model<br>for SPX options pricing"] --> B["Methodology: Train ML models<br>NN, Random Forest, Linear Regression"]
    B --> C["Data & Inputs:<br>SPX options price, moneyness,<br>time-to-maturity, SPX history,<br>dividend yield, risk-free rate"]
    C --> D["Computational Process:<br>Train models on 30 years<br>of historical daily data"]
    D --> E["Key Outcomes:<br>NN model (2 layers, 4 neurons) delivers<br>arbitrage-free prices, outperforms<br>Black-Scholes-Merton, Random Forest,<br>and Linear Regression"]