Distributionally Robust Deep Q-Learning

ArXiv ID: 2505.19058 “View on arXiv”

Authors: Chung I Lu, Julian Sester, Aijia Zhang

Abstract

We propose a novel distributionally robust $Q$-learning algorithm for the non-tabular case accounting for continuous state spaces where the state transition of the underlying Markov decision process is subject to model uncertainty. The uncertainty is taken into account by considering the worst-case transition from a ball around a reference probability measure. To determine the optimal policy under the worst-case state transition, we solve the associated non-linear Bellman equation by dualising and regularising the Bellman operator with the Sinkhorn distance, which is then parameterized with deep neural networks. This approach allows us to modify the Deep Q-Network algorithm to optimise for the worst case state transition. We illustrate the tractability and effectiveness of our approach through several applications, including a portfolio optimisation task based on S&{“P”}~500 data.

Keywords: Q-Learning, Distributional Robustness, Deep Q-Network (DQN), Model Uncertainty, Portfolio Optimization, Multi-Asset / Portfolio

Complexity vs Empirical Score

  • Math Complexity: 9.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematics including dualization of the Bellman operator and Sinkhorn distances for robust optimization, and it demonstrates effectiveness through a portfolio optimization task on S&P 500 data, indicating substantial empirical rigor.
  flowchart TD
    A["Research Goal"] -->|Develop DRL algorithm robust to model uncertainty| B["Methodology: Distributionally Robust Q-Learning"]
    B --> C["Modeling Uncertainty"]
    C --> D["Solve Non-linear Bellman Equation"]
    D --> E["Optimization: Sinkhorn Distance + Dualization"]
    E --> F["Deep Q-Network Implementation"]
    F --> G["Experimental Application"]
    G --> H["Findings: Robustness & Performance"]
    H --> I["S&P 500 Portfolio Optimization"]