Adaptive and Regime-Aware RL for Portfolio Optimization

ArXiv ID: 2509.14385 “View on arXiv”

Authors: Gabriel Nixon Raj

Abstract

This study proposes a regime-aware reinforcement learning framework for long-horizon portfolio optimization. Moving beyond traditional feedforward and GARCH-based models, we design realistic environments where agents dynamically reallocate capital in response to latent macroeconomic regime shifts. Agents receive hybrid observations and are trained using constrained reward functions that incorporate volatility penalties, capital resets, and tail-risk shocks. We benchmark multiple architectures, including PPO, LSTM-based PPO, and Transformer PPO, against classical baselines such as equal-weight and Sharpe-optimized portfolios. Our agents demonstrate robust performance under financial stress. While Transformer PPO achieves the highest risk-adjusted returns, LSTM variants offer a favorable trade-off between interpretability and training cost. The framework promotes regime-adaptive, explainable reinforcement learning for dynamic asset allocation.

Keywords: reinforcement learning, portfolio optimization, regime switching, Transformer, macroeconomic regime, Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematics in regime detection (HMM, GMM), Monte Carlo simulations, and RL architectures (Transformer, LSTM), while demonstrating empirical rigor through multi-decade financial datasets, stress-test alignment with historical crises, and comparative benchmarking against classical baselines.
  flowchart TD
    A["Research Goal: Adaptive Portfolio Optimization<br>under Macroeconomic Regime Shifts"] --> B["Data & Inputs<br>Multi-Asset Market Data"]
    B --> C["Methodology<br>Regime-Aware RL Environment Construction"]
    C --> D["Computational Process<br>Agent Training: PPO, LSTM-PPO, Transformer-PPO"]
    D --> E["Key Findings & Outcomes<br>1. Transformer-PPO: Highest Risk-Adjusted Returns<br>2. LSTM-PPO: Best Interpretability/Cost Trade-off<br>3. Robust Performance under Stress"]