false

Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System

Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System ArXiv ID: 2411.02211 “View on arXiv” Authors: Unknown Abstract The optimal control of sustainable energy supply systems, including renewable energies and energy storage, takes a central role in the decarbonization of industrial systems. However, the use of fluctuating renewable energies leads to fluctuations in energy generation and requires a suitable control strategy for the complex systems in order to ensure energy supply. In this paper, we consider an electrified power-to-heat system which is designed to supply heat in form of superheated steam for industrial processes. The system consists of a high-temperature heat pump for heat supply, a wind turbine for power generation, a sensible thermal energy storage for storing excess heat and a steam generator for providing steam. If the system’s energy demand cannot be covered by electricity from the wind turbine, additional electricity must be purchased from the power grid. For this system, we investigate the cost-optimal operation aiming to minimize the electricity cost from the grid by a suitable system control depending on the available wind power and the amount of stored thermal energy. This is a decision making problem under uncertainties about the future prices for electricity from the grid and the future generation of wind power. The resulting stochastic optimal control problem is treated as finite-horizon Markov decision process for a multi-dimensional controlled state process. We first consider the classical backward recursion technique for solving the associated dynamic programming equation for the value function and compute the optimal decision rule. Since that approach suffers from the curse of dimensionality we also apply reinforcement learning techniques, namely Q-learning, that are able to provide a good approximate solution to the optimization problem within reasonable time. ...

November 4, 2024 · 2 min · Research Team

Dynamic Investment-Driven Insurance Pricing and Optimal Regulation

Dynamic Investment-Driven Insurance Pricing and Optimal Regulation ArXiv ID: 2410.18432 “View on arXiv” Authors: Unknown Abstract This paper analyzes the equilibrium of insurance market in a dynamic setting, focusing on the interaction between insurers’ underwriting and investment strategies. Three possible equilibrium outcomes are identified: a positive insurance market, a zero insurance market, and market failure. Our findings reveal why insurers may rationally accept underwriting losses by setting a negative safety loading while relying on investment profits, particularly when there is a negative correlation between insurance gains and financial returns. Additionally, we explore the impact of regulatory frictions, showing that while imposing a cost on investment can enhance social welfare under certain conditions, it may not always be necessary. ...

October 24, 2024 · 2 min · Research Team

Vector-valued robust stochastic control

Vector-valued robust stochastic control ArXiv ID: 2407.00266 “View on arXiv” Authors: Unknown Abstract We study a dynamic stochastic control problem subject to Knightian uncertainty with multi-objective (vector-valued) criteria. Assuming the preferences across expected multi-loss vectors are represented by a given, yet general, preorder, we address the model uncertainty by adopting a robust or minimax perspective, minimizing expected loss across the worst-case model. For loss functions taking real (or scalar) values, there is no ambiguity in interpreting supremum and infimum. In contrast to the scalar case, major challenges for multi-loss control problems include properly defining and interpreting the notions of supremum and infimum, and addressing the non-uniqueness of these suprema and infima. To deal with these, we employ the notion of an ideal point vector-valued supremum for the robust part of the problem, while we view the control part as a multi-objective (or vector) optimization problem. Using a set-valued framework, we derive both a weak and strong version of the dynamic programming principle (DPP) or Bellman equations by taking the value function as the collection of all worst expected losses across all feasible actions. The weak version of Bellman’s principle is proved under minimal assumptions. To establish a stronger version of DPP, we introduce the rectangularity property with respect to a general preorder. We also further study a particular, but important, case of component-wise partial order of vectors, for which we additionally derive DPP under a different set-valued notion for the value function, the so-called upper image of the multi-objective problem. Finally, we provide illustrative examples motivated by financial problems. These results will serve as a foundation for addressing time-inconsistent problems subject to model uncertainty through the lens of a set-valued framework, as well as for studying multi-portfolio allocation problems under model uncertainty. ...

June 29, 2024 · 2 min · Research Team

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning ArXiv ID: 2405.13609 “View on arXiv” Authors: Unknown Abstract Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision process. However, a large class of problems does not fit straightforwardly into this framework: Non-cumulative Markov decision processes (NCMDPs), where instead of the expected sum of rewards, the expected value of an arbitrary function of the rewards is maximized. Example functions include the maximum of the rewards or their mean divided by their standard deviation. In this work, we introduce a general mapping of NCMDPs to standard MDPs. This allows all techniques developed to find optimal policies for MDPs, such as reinforcement learning or dynamic programming, to be directly applied to the larger class of NCMDPs. Focusing on reinforcement learning, we show applications in a diverse set of tasks, including classical control, portfolio optimization in finance, and discrete optimization problems. Given our approach, we can improve both final performance and training time compared to relying on standard MDPs. ...

May 22, 2024 · 2 min · Research Team

On Risk-Sensitive Decision Making Under Uncertainty

On Risk-Sensitive Decision Making Under Uncertainty ArXiv ID: 2404.13371 “View on arXiv” Authors: Unknown Abstract This paper studies a risk-sensitive decision-making problem under uncertainty. It considers a decision-making process that unfolds over a fixed number of stages, in which a decision-maker chooses among multiple alternatives, some of which are deterministic and others are stochastic. The decision-maker’s cumulative value is updated at each stage, reflecting the outcomes of the chosen alternatives. After formulating this as a stochastic control problem, we delineate the necessary optimality conditions for it. Two illustrative examples from optimal betting and inventory management are provided to support our theory. ...

April 20, 2024 · 1 min · Research Team

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading ArXiv ID: 2309.12891 “View on arXiv” Authors: Unknown Abstract High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 6 popular financial criteria, exceeding the runner-up by 30% in profitability. ...

September 22, 2023 · 3 min · Research Team

On Sparse Grid Interpolation for American Option Pricing with Multiple Underlying Assets

On Sparse Grid Interpolation for American Option Pricing with Multiple Underlying Assets ArXiv ID: 2309.08287 “View on arXiv” Authors: Unknown Abstract In this work, we develop a novel efficient quadrature and sparse grid based polynomial interpolation method to price American options with multiple underlying assets. The approach is based on first formulating the pricing of American options using dynamic programming, and then employing static sparse grids to interpolate the continuation value function at each time step. To achieve high efficiency, we first transform the domain from $\mathbb{“R”}^d$ to $(-1,1)^d$ via a scaled tanh map, and then remove the boundary singularity of the resulting multivariate function over $(-1,1)^d$ by a bubble function and simultaneously, to significantly reduce the number of interpolation points. We rigorously establish that with a proper choice of the bubble function, the resulting function has bounded mixed derivatives up to a certain order, which provides theoretical underpinnings for the use of sparse grids. Numerical experiments for American arithmetic and geometric basket put options with the number of underlying assets up to 16 are presented to validate the effectiveness of the approach. ...

September 15, 2023 · 2 min · Research Team

Reinforcement Learning for Financial Index Tracking

Reinforcement Learning for Financial Index Tracking ArXiv ID: 2308.02820 “View on arXiv” Authors: Unknown Abstract We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme. A comprehensive empirical study based on a 17-year-long testing set demonstrates that the proposed method outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy. ...

August 5, 2023 · 2 min · Research Team

Portfolio Optimization in a Market with Hidden Gaussian Drift and Randomly Arriving Expert Opinions: Modeling and Theoretical Results

Portfolio Optimization in a Market with Hidden Gaussian Drift and Randomly Arriving Expert Opinions: Modeling and Theoretical Results ArXiv ID: 2308.02049 “View on arXiv” Authors: Unknown Abstract This paper investigates the optimal selection of portfolios for power utility maximizing investors in a financial market where stock returns depend on a hidden Gaussian mean reverting drift process. Information on the drift is obtained from returns and expert opinions in the form of noisy signals about the current state of the drift arriving randomly over time. The arrival dates are modeled as the jump times of a homogeneous Poisson process. Applying Kalman filter techniques we derive estimates of the hidden drift which are described by the conditional mean and covariance of the drift given the observations. The utility maximization problem is solved with dynamic programming methods. We derive the associated dynamic programming equation and study regularization arguments for a rigorous mathematical justification. ...

August 3, 2023 · 2 min · Research Team

A Common Shock Model for multidimensional electricity intraday price modelling with application to battery valuation

A Common Shock Model for multidimensional electricity intraday price modelling with application to battery valuation ArXiv ID: 2307.16619 “View on arXiv” Authors: Unknown Abstract In this paper, we propose a multidimensional statistical model of intraday electricity prices at the scale of the trading session, which allows all products to be simulated simultaneously. This model, based on Poisson measures and inspired by the Common Shock Poisson Model, reproduces the Samuelson effect (intensity and volatility increases as time to maturity decreases). It also reproduces the price correlation structure, highlighted here in the data, which decreases as two maturities move apart. This model has only three parameters that can be estimated using a moment method that we propose here. We demonstrate the usefulness of the model on a case of storage valuation by dynamic programming over a trading session. ...

July 31, 2023 · 2 min · Research Team