false

Mean-Field Price Formation on Trees with Multi-Population and Non-Rational Agents

Mean-Field Price Formation on Trees with Multi-Population and Non-Rational Agents ArXiv ID: 2510.11261 “View on arXiv” Authors: Masaaki Fujii Abstract This work solves the equilibrium price formation problem for the risky stock by combining mean-field game theory with the binomial tree framework, adapting the classic approach of Cox, Ross & Rubinstein. For agents with exponential and recursive utilities of exponential-type, we prove the existence of a unique mean-field market-clearing equilibrium and derive an explicit analytic formula for equilibrium transition probabilities of the stock price on the binomial lattice. The agents face stochastic terminal liabilities and incremental endowments that depend on unhedgeable common and idiosyncratic factors, in addition to the stock price path. We also incorporate an external order flow. Furthermore, the analytic tractability of the proposed approach allows us to extend the framework in two important directions: First, we incorporate multi-population heterogeneity, allowing agents to differ in functional forms for their liabilities, endowments, and risk coefficients. Second, we relax the rational expectations hypothesis by modeling agents operating under subjective probability measures which induce stochastically biased views on the stock transition probabilities. Our numerical examples illustrate the qualitative effects of these components on the equilibrium price distribution. ...

October 13, 2025 · 2 min · Research Team

Joint Stochastic Optimal Control and Stopping in Aquaculture: Finite-Difference and PINN-Based Approaches

Joint Stochastic Optimal Control and Stopping in Aquaculture: Finite-Difference and PINN-Based Approaches ArXiv ID: 2510.02910 “View on arXiv” Authors: Kevin Kamm Abstract This paper studies a joint stochastic optimal control and stopping (JCtrlOS) problem motivated by aquaculture operations, where the objective is to maximize farm profit through an optimal feeding strategy and harvesting time under stochastic price dynamics. We introduce a simplified aquaculture model capturing essential biological and economic features, distinguishing between biologically optimal and economically optimal feeding strategies. The problem is formulated as a Hamilton-Jacobi-Bellman variational inequality and corresponding free boundary problem. We develop two numerical solution approaches: First, a finite difference scheme that serves as a benchmark, and second, a Physics-Informed Neural Network (PINN)-based method, combined with a deep optimal stopping (DeepOS) algorithm to improve stopping time accuracy. Numerical experiments demonstrate that while finite differences perform well in medium-dimensional settings, the PINN approach achieves comparable accuracy and is more scalable to higher dimensions where grid-based methods become infeasible. The results confirm that jointly optimizing feeding and harvesting decisions outperforms strategies that neglect either control or stopping. ...

October 3, 2025 · 2 min · Research Team

Generative Neural Operators of Log-Complexity Can Simultaneously Solve Infinitely Many Convex Programs

Generative Neural Operators of Log-Complexity Can Simultaneously Solve Infinitely Many Convex Programs ArXiv ID: 2508.14995 “View on arXiv” Authors: Anastasis Kratsios, Ariel Neufeld, Philipp Schmocker Abstract Neural operators (NOs) are a class of deep learning models designed to simultaneously solve infinitely many related problems by casting them into an infinite-dimensional space, whereon these NOs operate. A significant gap remains between theory and practice: worst-case parameter bounds from universal approximation theorems suggest that NOs may require an unrealistically large number of parameters to solve most operator learning problems, which stands in direct opposition to a slew of experimental evidence. This paper closes that gap for a specific class of {“NOs”}, generative {“equilibrium operators”} (GEOs), using (realistic) finite-dimensional deep equilibrium layers, when solving families of convex optimization problems over a separable Hilbert space $X$. Here, the inputs are smooth, convex loss functions on $X$, and outputs are the associated (approximate) solutions to the optimization problem defined by each input loss. We show that when the input losses lie in suitable infinite-dimensional compact sets, our GEO can uniformly approximate the corresponding solutions to arbitrary precision, with rank, depth, and width growing only logarithmically in the reciprocal of the approximation error. We then validate both our theoretical results and the trainability of GEOs on three applications: (1) nonlinear PDEs, (2) stochastic optimal control problems, and (3) hedging problems in mathematical finance under liquidity constraints. ...

August 20, 2025 · 2 min · Research Team

Gaining efficiency in deep policy gradient method for continuous-time optimal control problems

Gaining efficiency in deep policy gradient method for continuous-time optimal control problems ArXiv ID: 2502.14141 “View on arXiv” Authors: Unknown Abstract In this paper, we propose an efficient implementation of deep policy gradient method (PGM) for optimal control problems in continuous time. The proposed method has the ability to manage the allocation of computational resources, number of trajectories, and complexity of architecture of the neural network. This is, in particular, important for continuous-time problems that require a fine time discretization. Each step of this method focuses on a different time scale and learns a policy, modeled by a neural network, for a discretized optimal control problem. The first step has the coarsest time discretization. As we proceed to other steps, the time discretization becomes finer. The optimal trained policy in each step is also used to provide data for the next step. We accompany the multi-scale deep PGM with a theoretical result on allocation of computational resources to obtain a targeted efficiency and test our methods on the linear-quadratic stochastic optimal control problem. ...

February 19, 2025 · 2 min · Research Team

Minimal Shortfall Strategies for Liquidation of a Basket of Stocks using Reinforcement Learning

Minimal Shortfall Strategies for Liquidation of a Basket of Stocks using Reinforcement Learning ArXiv ID: 2502.07868 “View on arXiv” Authors: Unknown Abstract This paper studies the ubiquitous problem of liquidating large quantities of highly correlated stocks, a task frequently encountered by institutional investors and proprietary trading firms. Traditional methods in this setting suffer from the curse of dimensionality, making them impractical for high-dimensional problems. In this work, we propose a novel method based on stochastic optimal control to optimally tackle this complex multidimensional problem. The proposed method minimizes the overall execution shortfall of highly correlated stocks using a reinforcement learning approach. We rigorously establish the convergence of our optimal trading strategy and present an implementation of our algorithm using intra-day market data. ...

February 11, 2025 · 2 min · Research Team

Stochastic Optimal Control of Iron Condor Portfolios for Profitability and Risk Management

Stochastic Optimal Control of Iron Condor Portfolios for Profitability and Risk Management ArXiv ID: 2501.12397 “View on arXiv” Authors: Unknown Abstract Previous research on option strategies has primarily focused on their behavior near expiration, with limited attention to the transient value process of the portfolio. In this paper, we formulate Iron Condor portfolio optimization as a stochastic optimal control problem, examining the impact of the control process ( u(k_i, τ) ) on the portfolio’s potential profitability and risk. By assuming the underlying price process as a bounded martingale within $[“K_1, K_2”]$, we prove that the portfolio with a strike structure of $k_1 < k_2 = K_2 < S_t < k_3 = K_3 < k_4$ has a submartingale value process, which results in the optimal stopping time aligning with the expiration date $τ= T$. Moreover, we construct a data generator based on the Rough Heston model to investigate general scenarios through simulation. The results show that asymmetric, left-biased Iron Condor portfolios with $τ= T$ are optimal in SPX markets, balancing profitability and risk management. Deep out-of-the-money strategies improve profitability and success rates at the cost of introducing extreme losses, which can be alleviated by using an optimal stopping strategy. Except for the left-biased portfolios $τ$ generally falls within the range of [“50%,75%”] of total duration. In addition, we validate these findings through case studies on the actual SPX market, covering bullish, sideways, and bearish market conditions. ...

January 6, 2025 · 2 min · Research Team

Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System

Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System ArXiv ID: 2411.02211 “View on arXiv” Authors: Unknown Abstract The optimal control of sustainable energy supply systems, including renewable energies and energy storage, takes a central role in the decarbonization of industrial systems. However, the use of fluctuating renewable energies leads to fluctuations in energy generation and requires a suitable control strategy for the complex systems in order to ensure energy supply. In this paper, we consider an electrified power-to-heat system which is designed to supply heat in form of superheated steam for industrial processes. The system consists of a high-temperature heat pump for heat supply, a wind turbine for power generation, a sensible thermal energy storage for storing excess heat and a steam generator for providing steam. If the system’s energy demand cannot be covered by electricity from the wind turbine, additional electricity must be purchased from the power grid. For this system, we investigate the cost-optimal operation aiming to minimize the electricity cost from the grid by a suitable system control depending on the available wind power and the amount of stored thermal energy. This is a decision making problem under uncertainties about the future prices for electricity from the grid and the future generation of wind power. The resulting stochastic optimal control problem is treated as finite-horizon Markov decision process for a multi-dimensional controlled state process. We first consider the classical backward recursion technique for solving the associated dynamic programming equation for the value function and compute the optimal decision rule. Since that approach suffers from the curse of dimensionality we also apply reinforcement learning techniques, namely Q-learning, that are able to provide a good approximate solution to the optimization problem within reasonable time. ...

November 4, 2024 · 2 min · Research Team

Reinforcement Learning for Corporate Bond Trading: A Sell Side Perspective

Reinforcement Learning for Corporate Bond Trading: A Sell Side Perspective ArXiv ID: 2406.12983 “View on arXiv” Authors: Unknown Abstract A corporate bond trader in a typical sell side institution such as a bank provides liquidity to the market participants by buying/selling securities and maintaining an inventory. Upon receiving a request for a buy/sell price quote (RFQ), the trader provides a quote by adding a spread over a \textit{“prevalent market price”}. For illiquid bonds, the market price is harder to observe, and traders often resort to available benchmark bond prices (such as MarketAxess, Bloomberg, etc.). In \cite{“Bergault2023ModelingLI”}, the concept of \textit{“Fair Transfer Price”} for an illiquid corporate bond was introduced which is derived from an infinite horizon stochastic optimal control problem (for maximizing the trader’s expected P&L, regularized by the quadratic variation). In this paper, we consider the same optimization objective, however, we approach the estimation of an optimal bid-ask spread quoting strategy in a data driven manner and show that it can be learned using Reinforcement Learning. Furthermore, we perform extensive outcome analysis to examine the reasonableness of the trained agent’s behavior. ...

June 18, 2024 · 2 min · Research Team

Market Making in Spot Precious Metals

Market Making in Spot Precious Metals ArXiv ID: 2404.15478 “View on arXiv” Authors: Unknown Abstract The primary challenge of market making in spot precious metals is navigating the liquidity that is mainly provided by futures contracts. The Exchange for Physical (EFP) spread, which is the price difference between futures and spot, plays a pivotal role and exhibits multiple modes of relaxation corresponding to the diverse trading horizons of market participants. In this paper, we model the EFP spread using a nested Ornstein-Uhlenbeck process, in the spirit of the two-factor Hull-White model for interest rates. We demonstrate the suitability of the framework for maximizing the expected P&L of a market maker while minimizing inventory risk across both spot and futures. Using a computationally efficient technique to approximate the solution of the Hamilton-Jacobi-Bellman equation associated with the corresponding stochastic optimal control problem, our methodology facilitates strategy optimization on demand in near real-time, paving the way for advanced algorithmic market making that capitalizes on the co-integration properties intrinsic to the precious metals sector. ...

April 23, 2024 · 2 min · Research Team

Rank-Dependent Predictable Forward Performance Processes

Rank-Dependent Predictable Forward Performance Processes ArXiv ID: 2403.16228 “View on arXiv” Authors: Unknown Abstract Predictable forward performance processes (PFPPs) are stochastic optimal control frameworks for an agent who controls a randomly evolving system but can only prescribe the system dynamics for a short period ahead. This is a common scenario in which a controlling agent frequently re-calibrates her model. We introduce a new class of PFPPs based on rank-dependent utility, generalizing existing models that are based on expected utility theory (EUT). We establish existence of rank-dependent PFPPs under a conditionally complete market and exogenous probability distortion functions which are updated periodically. We show that their construction reduces to solving an integral equation that generalizes the integral equation obtained under EUT in previous studies. We then propose a new approach for solving the integral equation via theory of Volterra equations. We illustrate our result in the special case of conditionally complete Black-Scholes model. ...

March 24, 2024 · 2 min · Research Team