false

On Risk-Sensitive Decision Making Under Uncertainty

On Risk-Sensitive Decision Making Under Uncertainty ArXiv ID: 2404.13371 “View on arXiv” Authors: Unknown Abstract This paper studies a risk-sensitive decision-making problem under uncertainty. It considers a decision-making process that unfolds over a fixed number of stages, in which a decision-maker chooses among multiple alternatives, some of which are deterministic and others are stochastic. The decision-maker’s cumulative value is updated at each stage, reflecting the outcomes of the chosen alternatives. After formulating this as a stochastic control problem, we delineate the necessary optimality conditions for it. Two illustrative examples from optimal betting and inventory management are provided to support our theory. ...

April 20, 2024 · 1 min · Research Team

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty ArXiv ID: 2404.12598 “View on arXiv” Authors: Unknown Abstract This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent’s risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton’s investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem. ...

April 19, 2024 · 2 min · Research Team