Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning
ArXiv ID: 2405.13609 “View on arXiv”
Authors: Unknown
Abstract
Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision process. However, a large class of problems does not fit straightforwardly into this framework: Non-cumulative Markov decision processes (NCMDPs), where instead of the expected sum of rewards, the expected value of an arbitrary function of the rewards is maximized. Example functions include the maximum of the rewards or their mean divided by their standard deviation. In this work, we introduce a general mapping of NCMDPs to standard MDPs. This allows all techniques developed to find optimal policies for MDPs, such as reinforcement learning or dynamic programming, to be directly applied to the larger class of NCMDPs. Focusing on reinforcement learning, we show applications in a diverse set of tasks, including classical control, portfolio optimization in finance, and discrete optimization problems. Given our approach, we can improve both final performance and training time compared to relying on standard MDPs.
Keywords: Markov decision processes, non-cumulative MDPs, reinforcement learning, portfolio optimization, dynamic programming, Portfolio Management
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 4.0/10
- Quadrant: Lab Rats
- Why: The paper presents a sophisticated theoretical framework mapping NCMDPs to MDPs using auxiliary state variables and reward transformations, supported by heavy mathematical notation and proofs. While it includes numerical experiments on finance portfolio optimization and control tasks, the summary focuses on the theoretical mapping and lacks specific implementation details, backtest results, or statistical metrics needed for high empirical rigor.
flowchart TD
Goal["Research Goal: Solve Non-Cumulative Markov Decision Processes (NCMDPs)"] --> Methodology["Methodology: Map NCMDPs to Standard MDPs"]
Methodology --> Inputs["Inputs: Diverse Domains<br>(Control, Finance, Optimization)"]
Inputs --> Process["Computational Process:<br>Reinforcement Learning on Mapped MDPs"]
Process --> Outcome1["Outcome: Improved Final Performance"]
Process --> Outcome2["Outcome: Faster Training Time"]
Process --> Outcome3["Outcome: Unified Framework for NCMDPs"]