Quant Finance Research Hub

Deep learning interpretability for rough volatility

Deep learning interpretability for rough volatility ArXiv ID: 2411.19317 “View on arXiv” Authors: Unknown Abstract Deep learning methods have become a widespread toolbox for pricing and calibration of financial models. While they often provide new directions and research results, their `black box’ nature also results in a lack of interpretability. We provide a detailed interpretability analysis of these methods in the context of rough volatility - a new class of volatility models for Equity and FX markets. Our work sheds light on the neural network learned inverse map between the rough volatility model parameters, seen as mathematical model inputs and network outputs, and the resulting implied volatility across strikes and maturities, seen as mathematical model outputs and network inputs. This contributes to building a solid framework for a safer use of neural networks in this context and in quantitative finance more generally. ...

Double Descent in Portfolio Optimization: Dance between Theoretical Sharpe Ratio and Estimation Accuracy

Double Descent in Portfolio Optimization: Dance between Theoretical Sharpe Ratio and Estimation Accuracy ArXiv ID: 2411.18830 “View on arXiv” Authors: Unknown Abstract We study the relationship between model complexity and out-of-sample performance in the context of mean-variance portfolio optimization. Representing model complexity by the number of assets, we find that the performance of low-dimensional models initially improves with complexity but then declines due to overfitting. As model complexity becomes sufficiently high, the performance improves with complexity again, resulting in a double ascent Sharpe ratio curve similar to the double descent phenomenon observed in artificial intelligence. The underlying mechanisms involve an intricate interaction between the theoretical Sharpe ratio and estimation accuracy. In high-dimensional models, the theoretical Sharpe ratio approaches its upper limit, and the overfitting problem is reduced because there are more parameters than data restrictions, which allows us to choose well-behaved parameters based on inductive bias. ...

GRU-PFG: Extract Inter-Stock Correlation from Stock Factors with Graph Neural Network

GRU-PFG: Extract Inter-Stock Correlation from Stock Factors with Graph Neural Network ArXiv ID: 2411.18997 “View on arXiv” Authors: Unknown Abstract The complexity of stocks and industries presents challenges for stock prediction. Currently, stock prediction models can be divided into two categories. One category, represented by GRU and ALSTM, relies solely on stock factors for prediction, with limited effectiveness. The other category, represented by HIST and TRA, incorporates not only stock factors but also industry information, industry financial reports, public sentiment, and other inputs for prediction. The second category of models can capture correlations between stocks by introducing additional information, but the extra data is difficult to standardize and generalize. Considering the current state and limitations of these two types of models, this paper proposes the GRU-PFG (Project Factors into Graph) model. This model only takes stock factors as input and extracts inter-stock correlations using graph neural networks. It achieves prediction results that not only outperform the others models relies solely on stock factors, but also achieve comparable performance to the second category models. The experimental results show that on the CSI300 dataset, the IC of GRU-PFG is 0.134, outperforming HIST’s 0.131 and significantly surpassing GRU and Transformer, achieving results better than the second category models. Moreover as a model that relies solely on stock factors, it has greater potential for generalization. ...

On the relative performance of some parametric and nonparametric estimators of option prices

On the relative performance of some parametric and nonparametric estimators of option prices ArXiv ID: 2412.00135 “View on arXiv” Authors: Unknown Abstract We examine the empirical performance of some parametric and nonparametric estimators of prices of options with a fixed time to maturity, focusing on variance-gamma and Heston models on one side, and on expansions in Hermite functions on the other side. The latter class of estimators can be seen as perturbations of the classical Black-Scholes model. The comparison between parametric and Hermite-based models having the same “degrees of freedom” is emphasized. The main criterion is the out-of-sample relative pricing error on a dataset of historical option prices on the S&P500 index. Prior to the main empirical study, the approximation of variance-gamma and Heston densities by series of Hermite functions is studied, providing explicit expressions for the coefficients of the expansion in the former case, and integral expressions involving the explicit characteristic function in the latter case. Moreover, these approximations are investigated numerically on a few test cases, indicating that expansions in Hermite functions with few terms achieve competitive accuracy in the estimation of Heston densities and the pricing of (European) options, but they perform less effectively with variance-gamma densities. On the other hand, the main large-scale empirical study show that parsimonious Hermite estimators can even outperform the Heston model in terms of pricing errors. These results underscore the trade-offs inherent in model selection and calibration, and their empirical fit in practical applications. ...

Limit Order Book Event Stream Prediction with Diffusion Model

Limit Order Book Event Stream Prediction with Diffusion Model ArXiv ID: 2412.09631 “View on arXiv” Authors: Unknown Abstract Limit order book (LOB) is a dynamic, event-driven system that records real-time market demand and supply for a financial asset in a stream flow. Event stream prediction in LOB refers to forecasting both the timing and the type of events. The challenge lies in modeling the time-event distribution to capture the interdependence between time and event type, which has traditionally relied on stochastic point processes. However, modeling complex market dynamics using stochastic processes, e.g., Hawke stochastic process, can be simplistic and struggle to capture the evolution of market dynamics. In this study, we present LOBDIF (LOB event stream prediction with diffusion model), which offers a new paradigm for event stream prediction within the LOB system. LOBDIF learns the complex time-event distribution by leveraging a diffusion model, which decomposes the time-event distribution into sequential steps, with each step represented by a Gaussian distribution. Additionally, we propose a denoising network and a skip-step sampling strategy. The former facilitates effective learning of time-event interdependence, while the latter accelerates the sampling process during inference. By introducing a diffusion model, our approach breaks away from traditional modeling paradigms, offering novel insights and providing an effective and efficient solution for learning the time-event distribution in order streams within the LOB system. Extensive experiments using real-world data from the limit order books of three widely traded assets confirm that LOBDIF significantly outperforms current state-of-the-art methods. ...

Optimal payoff under Bregman-Wasserstein divergence constraints

Optimal payoff under Bregman-Wasserstein divergence constraints ArXiv ID: 2411.18397 “View on arXiv” Authors: Unknown Abstract We study optimal payoff choice for an expected utility maximizer under the constraint that their payoff is not allowed to deviate ``too much’’ from a given benchmark. We solve this problem when the deviation is assessed via a Bregman-Wasserstein (BW) divergence, generated by a convex function $φ$. Unlike the Wasserstein distance (i.e., when $φ(x)=x^2$) the inherent asymmetry of the BW divergence makes it possible to penalize positive deviations different than negative ones. As a main contribution, we provide the optimal payoff in this setting. Numerical examples illustrate that the choice of $φ$ allow to better align the payoff choice with the objectives of investors. ...

Joint Combinatorial Node Selection and Resource Allocations in the Lightning Network using Attention-based Reinforcement Learning

Joint Combinatorial Node Selection and Resource Allocations in the Lightning Network using Attention-based Reinforcement Learning ArXiv ID: 2411.17353 “View on arXiv” Authors: Unknown Abstract The Lightning Network (LN) has emerged as a second-layer solution to Bitcoin’s scalability challenges. The rise of Payment Channel Networks (PCNs) and their specific mechanisms incentivize individuals to join the network for profit-making opportunities. According to the latest statistics, the total value locked within the Lightning Network is approximately $500 million. Meanwhile, joining the LN with the profit-making incentives presents several obstacles, as it involves solving a complex combinatorial problem that encompasses both discrete and continuous control variables related to node selection and resource allocation, respectively. Current research inadequately captures the critical role of resource allocation and lacks realistic simulations of the LN routing mechanism. In this paper, we propose a Deep Reinforcement Learning (DRL) framework, enhanced by the power of transformers, to address the Joint Combinatorial Node Selection and Resource Allocation (JCNSRA) problem. We have improved upon an existing environment by introducing modules that enhance its routing mechanism, thereby narrowing the gap with the actual LN routing system and ensuring compatibility with the JCNSRA problem. We compare our model against several baselines and heuristics, demonstrating its superior performance across various settings. Additionally, we address concerns regarding centralization in the LN by deploying our agent within the network and monitoring the centrality measures of the evolved graph. Our findings suggest not only an absence of conflict between LN’s decentralization goals and individuals’ revenue-maximization incentives but also a positive association between the two. ...

Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading

Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading ArXiv ID: 2411.17900 “View on arXiv” Authors: Unknown Abstract Developing effective quantitative trading strategies using reinforcement learning (RL) is challenging due to the high risks associated with online interaction with live financial markets. Consequently, offline RL, which leverages historical market data without additional exploration, becomes essential. However, existing offline RL methods often struggle to capture the complex temporal dependencies inherent in financial time series and may overfit to historical patterns. To address these challenges, we introduce a Decision Transformer (DT) initialized with pre-trained GPT-2 weights and fine-tuned using Low-Rank Adaptation (LoRA). This architecture leverages the generalization capabilities of pre-trained language models and the efficiency of LoRA to learn effective trading policies from expert trajectories solely from historical data. Our model performs competitively with established offline RL algorithms, including Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Behavior Cloning (BC), as well as a baseline Decision Transformer with randomly initialized GPT-2 weights and LoRA. Empirical results demonstrate that our approach effectively learns from expert trajectories and secures superior rewards in certain trading scenarios, highlighting the effectiveness of integrating pre-trained language models and parameter-efficient fine-tuning in offline RL for quantitative trading. Replication code for our experiments is publicly available at https://github.com/syyunn/finrl-dt ...

AD-HOC: A C++ Expression Template package for high-order derivatives backpropagation

AD-HOC: A C++ Expression Template package for high-order derivatives backpropagation ArXiv ID: 2412.05300 “View on arXiv” Authors: Unknown Abstract This document presents a new C++ Automatic Differentiation (AD) tool, AD-HOC (Automatic Differentiation for High-Order Calculations). This tool aims to have the following features: -Calculation of user specified derivatives of arbitrary order -To be able to run with similar speeds as handwritten code -All derivatives calculations are computed in a single backpropagation tree pass -No source code generation is used, relying heavily on the C++ compiler to statically build the computation tree before runtime -A simple interface -The ability to be used \textit{“in conjunction”} with other established, general-purpose dynamic AD tools -Header-only library, with no external dependencies -Open source, with a business-friendly license ...

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors ArXiv ID: 2411.16666 “View on arXiv” Authors: Unknown Abstract We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear or temporal correlations among features, we also propose a new kernel-based independence measure. CatNet performs robustly on different model settings with both simulated and real-world data, which reduces overfitting and improves interpretability of the model. Our framework that introduces SHAP for feature importance in FDR control algorithms and improves Gaussian Mirror can be naturally extended to other time-series or sequential deep learning models. ...