false

Continuous-time reinforcement learning for optimal switching over multiple regimes

Continuous-time reinforcement learning for optimal switching over multiple regimes ArXiv ID: 2512.04697 “View on arXiv” Authors: Yijie Huang, Mengge Li, Xiang Yu, Zhou Zhou Abstract This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing of switches and the selection of regimes through the generator matrix of an associated continuous-time finite-state Markov chain. We establish the well-posedness of the associated system of Hamilton-Jacobi-Bellman (HJB) equations and provide a characterization of the optimal policy. The policy improvement and the convergence of the policy iterations are rigorously established by analyzing the system of equations. We also show the convergence of the value function in the exploratory formulation towards the value function in the classical formulation as the temperature parameter vanishes. Finally, a reinforcement learning algorithm is devised and implemented by invoking the policy evaluation based on the martingale characterization. Our numerical examples with the aid of neural networks illustrate the effectiveness of the proposed RL algorithm. ...

December 4, 2025 · 2 min · Research Team

Optimal Investment and Consumption in a Stochastic Factor Model

Optimal Investment and Consumption in a Stochastic Factor Model ArXiv ID: 2509.09452 “View on arXiv” Authors: Florian Gutekunst, Martin Herdegen, David Hobson Abstract In this article, we study optimal investment and consumption in an incomplete stochastic factor model for a power utility investor on the infinite horizon. When the state space of the stochastic factor is finite, we give a complete characterisation of the well-posedness of the problem, and provide an efficient numerical algorithm for computing the value function. When the state space is a (possibly infinite) open interval and the stochastic factor is represented by an Itô diffusion, we develop a general theory of sub- and supersolutions for second-order ordinary differential equations on open domains without boundary values to prove existence of the solution to the Hamilton-Jacobi-Bellman (HJB) equation along with explicit bounds for the solution. By characterising the asymptotic behaviour of the solution, we are also able to provide rigorous verification arguments for various models, including – for the first time – the Heston model. Finally, we link the discrete and continuous setting and show that that the value function in the diffusion setting can be approximated very efficiently through a fast discretisation scheme. ...

September 11, 2025 · 2 min · Research Team

Optimal Exit Time for Liquidity Providers in Automated Market Makers

Optimal Exit Time for Liquidity Providers in Automated Market Makers ArXiv ID: 2509.06510 “View on arXiv” Authors: Philippe Bergault, Sébastien Bieber, Leandro Sánchez-Betancourt Abstract We study the problem of optimal liquidity withdrawal for a representative liquidity provider (LP) in an automated market maker (AMM). LPs earn fees from trading activity but are exposed to impermanent loss (IL) due to price fluctuations. While existing work has focused on static provision and exogenous exit strategies, we characterise the optimal exit time as the solution to a stochastic control problem with an endogenous stopping time. Mathematically, the LP’s value function is shown to satisfy a Hamilton-Jacobi-Bellman quasi-variational inequality, for which we establish uniqueness in the viscosity sense. To solve the problem numerically, we develop two complementary approaches: a Euler scheme based on operator splitting and a Longstaff-Schwartz regression method. Calibrated simulations highlight how the LP’s optimal exit strategy depends on the oracle price volatility, fee levels, and the behaviour of arbitrageurs and noise traders. Our results show that while arbitrage generates both fees and IL, the LP’s optimal decision balances these opposing effects based on the pool state variables and price misalignments. Lastly, we find the optimal fee level for the representative LP when they play the exit strategy we derived. This work contributes to a deeper understanding of dynamic liquidity provision in AMMs and provides insights into the sustainability of passive LP strategies under different market regimes. ...

September 8, 2025 · 2 min · Research Team

Optimal Trading under Instantaneous and Persistent Price Impact, Predictable Returns and Multiscale Stochastic Volatility

Optimal Trading under Instantaneous and Persistent Price Impact, Predictable Returns and Multiscale Stochastic Volatility ArXiv ID: 2507.17162 “View on arXiv” Authors: Patrick Chan, Ronnie Sircar, Iosif Zimbidis Abstract We consider a dynamic portfolio optimization problem that incorporates predictable returns, instantaneous transaction costs, price impact, and stochastic volatility, extending the classical results of Garleanu and Pedersen (2013), which assume constant volatility. Constructing the optimal portfolio strategy in this general setting is challenging due to the nonlinear nature of the resulting Hamilton-Jacobi-Bellman (HJB) equations. To address this, we propose a multi-scale volatility expansion that captures stochastic volatility dynamics across different time scales. Specifically, the analysis involves a singular perturbation for the fast mean-reverting volatility factor and a regular perturbation for the slow-moving factor. We also introduce an approximation for small price impact and demonstrate its numerical accuracy. We formally derive asymptotic approximations up to second order and use Monte Carlo simulations to show how incorporating these corrections improves the Profit and Loss (PnL) of the resulting portfolio strategy. ...

July 23, 2025 · 2 min · Research Team

On Quantum BSDE Solver for High-Dimensional Parabolic PDEs

On Quantum BSDE Solver for High-Dimensional Parabolic PDEs ArXiv ID: 2506.14612 “View on arXiv” Authors: Howard Su, Huan-Hsin Tseng Abstract We propose a quantum machine learning framework for approximating solutions to high-dimensional parabolic partial differential equations (PDEs) that can be reformulated as backward stochastic differential equations (BSDEs). In contrast to popular quantum-classical network hybrid approaches, this study employs the pure Variational Quantum Circuit (VQC) as the core solver without trainable classical neural networks. The quantum BSDE solver performs pathwise approximation via temporal discretization and Monte Carlo simulation, framed as model-based reinforcement learning. We benchmark VQCbased and classical deep neural network (DNN) solvers on two canonical PDEs as representatives: the Black-Scholes and nonlinear Hamilton-Jacobi-Bellman (HJB) equations. The VQC achieves lower variance and improved accuracy in most cases, particularly in highly nonlinear regimes and for out-of-themoney options, demonstrating greater robustness than DNNs. These results, obtained via quantum circuit simulation, highlight the potential of VQCs as scalable and stable solvers for highdimensional stochastic control problems. ...

June 17, 2025 · 2 min · Research Team

Deep Learning for Continuous-time Stochastic Control with Jumps

Deep Learning for Continuous-time Stochastic Control with Jumps ArXiv ID: 2505.15602 “View on arXiv” Authors: Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut Abstract In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks. ...

May 21, 2025 · 2 min · Research Team

Portfolio Optimization with Feedback Strategies Based on Artificial Neural Networks

Portfolio Optimization with Feedback Strategies Based on Artificial Neural Networks ArXiv ID: 2411.09899 “View on arXiv” Authors: Unknown Abstract With the recent advancements in machine learning (ML), artificial neural networks (ANN) are starting to play an increasingly important role in quantitative finance. Dynamic portfolio optimization is among many problems that have significantly benefited from a wider adoption of deep learning (DL). While most existing research has primarily focused on how DL can alleviate the curse of dimensionality when solving the Hamilton-Jacobi-Bellman (HJB) equation, some very recent developments propose to forego derivation and solution of HJB in favor of empirical utility maximization over dynamic allocation strategies expressed through ANN. In addition to being simple and transparent, this approach is universally applicable, as it is essentially agnostic about market dynamics. To showcase the method, we apply it to optimal portfolio allocation between a cash account and the S&P 500 index modeled using geometric Brownian motion or the Heston model. In both cases, the results are demonstrated to be on par with those under the theoretical optimal weights assuming isoelastic utility and real-time rebalancing. A set of R codes for a broad class of stochastic volatility models are provided as a supplement. ...

November 15, 2024 · 2 min · Research Team

Logarithmic regret in the ergodic Avellaneda-Stoikov market making model

Logarithmic regret in the ergodic Avellaneda-Stoikov market making model ArXiv ID: 2409.02025 “View on arXiv” Authors: Unknown Abstract We analyse the regret arising from learning the price sensitivity parameter $κ$ of liquidity takers in the ergodic version of the Avellaneda-Stoikov market making model. We show that a learning algorithm based on a maximum-likelihood estimator for the parameter achieves the regret upper bound of order $\ln^2 T$ in expectation. To obtain the result we need two key ingredients. The first is the twice differentiability of the ergodic constant under the misspecified parameter in the Hamilton-Jacobi-Bellman (HJB) equation with respect to $κ$, which leads to a second–order performance gap. The second is the learning rate of the regularised maximum-likelihood estimator which is obtained from concentration inequalities for Bernoulli signals. Numerical experiments confirm the convergence and the robustness of the proposed algorithm. ...

September 3, 2024 · 2 min · Research Team

A monotone piecewise constant control integration approach for the two-factor uncertain volatility model

A monotone piecewise constant control integration approach for the two-factor uncertain volatility model ArXiv ID: 2402.06840 “View on arXiv” Authors: Unknown Abstract Option contracts on two underlying assets within uncertain volatility models have their worst-case and best-case prices determined by a two-dimensional (2D) Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE) with cross-derivative terms. This paper introduces a novel ``decompose and integrate, then optimize’’ approach to tackle this HJB PDE. Within each timestep, our method applies piecewise constant control, yielding a set of independent linear 2D PDEs, each corresponding to a discretized control value. Leveraging closed-form Green’s functions, these PDEs are efficiently solved via 2D convolution integrals using a monotone numerical integration method. The value function and optimal control are then obtained by synthesizing the solutions of the individual PDEs. For enhanced efficiency, we implement the integration via Fast Fourier Transforms, exploiting the Toeplitz matrix structure. The proposed method is $\ell_{"\infty"}$-stable, consistent in the viscosity sense, and converges to the viscosity solution of the HJB equation. Numerical results show excellent agreement with benchmark solutions obtained by finite differences, tree methods, and Monte Carlo simulation, highlighting its robustness and effectiveness. ...

February 9, 2024 · 2 min · Research Team

Residual U-net with Self-Attention to Solve Multi-Agent Time-Consistent Optimal Trade Execution

Residual U-net with Self-Attention to Solve Multi-Agent Time-Consistent Optimal Trade Execution ArXiv ID: 2312.09353 “View on arXiv” Authors: Unknown Abstract In this paper, we explore the use of a deep residual U-net with self-attention to solve the the continuous time time-consistent mean variance optimal trade execution problem for multiple agents and assets. Given a finite horizon we formulate the time-consistent mean-variance optimal trade execution problem following the Almgren-Chriss model as a Hamilton-Jacobi-Bellman (HJB) equation. The HJB formulation is known to have a viscosity solution to the unknown value function. We reformulate the HJB to a backward stochastic differential equation (BSDE) to extend the problem to multiple agents and assets. We utilize a residual U-net with self-attention to numerically approximate the value function for multiple agents and assets which can be used to determine the time-consistent optimal control. In this paper, we show that the proposed neural network approach overcomes the limitations of finite difference methods. We validate our results and study parameter sensitivity. With our framework we study how an agent with significant price impact interacts with an agent without any price impact and the optimal strategies used by both types of agents. We also study the performance of multiple sellers and buyers and how they compare to a holding strategy under different economic conditions. ...

December 14, 2023 · 2 min · Research Team