FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management

ArXiv ID: 2510.02986 “View on arXiv”

Authors: Jian’an Zhang

Abstract

Transaction costs and regime shifts are major reasons why paper portfolios fail in live trading. We introduce FR-LUX (Friction-aware, Regime-conditioned Learning under eXecution costs), a reinforcement learning framework that learns after-cost trading policies and remains robust across volatility-liquidity regimes. FR-LUX integrates three ingredients: (i) a microstructure-consistent execution model combining proportional and impact costs, directly embedded in the reward; (ii) a trade-space trust region that constrains changes in inventory flow rather than logits, yielding stable low-turnover updates; and (iii) explicit regime conditioning so the policy specializes to LL/LH/HL/HH states without fragmenting the data. On a 4 x 5 grid of regimes and cost levels with multiple random seeds, FR-LUX achieves the top average Sharpe ratio with narrow bootstrap confidence intervals, maintains a flatter cost-performance slope than strong baselines, and attains superior risk-return efficiency for a given turnover budget. Pairwise scenario-level improvements are strictly positive and remain statistically significant after multiple-testing corrections. We provide formal guarantees on optimality under convex frictions, monotonic improvement under a KL trust region, long-run turnover bounds and induced inaction bands due to proportional costs, positive value advantage for regime-conditioned policies, and robustness to cost misspecification. The methodology is implementable: costs are calibrated from standard liquidity proxies, scenario-level inference avoids pseudo-replication, and all figures and tables are reproducible from released artifacts.

Keywords: Reinforcement Learning, Transaction Costs, Market Regimes, Execution Optimization, Robust Trading, General Financial Markets

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper presents advanced mathematical formulations including MDP modeling, convex optimization theory, and formal proofs for policy improvement, scoring high on math complexity. It also demonstrates strong empirical rigor with a comprehensive backtest framework on 20 scenarios, robust statistical inference (Romano-Wolf corrections), and reproducible artifacts.
  flowchart TD
    A["Research Goal<br>How to build robust, implementable trading policies<br>that account for execution costs and market regimes?"] --> B["Methodology: FR-LUX Framework<br>(Friction-aware, Regime-conditioned RL)"]
    B --> C["Key Components<br>1. Microstructure Cost Model (Proportional + Impact)<br>2. Trade-Space Trust Region (Inventory Flow Constraints)<br>3. Explicit Regime Conditioning (LL/LH/HL/HH)"]
    C --> D["Data & Inputs<br>• Multi-regime scenarios (4x5 grid)<br>• Calibrated transaction costs<br>• Liquidity proxies"]
    D --> E["Computational Process<br>Reinforcement Learning Optimization<br>with Cost-Aware Rewards & Trust Region"]
    E --> F["Key Findings & Outcomes<br>• Top Sharpe with tight confidence intervals<br>• Flatter cost-performance slope vs. baselines<br>• Positive, significant scenario improvements<br>• Theoretical guarantees (optimality, robustness)"]
    F --> G["Outcome: Implementable, Robust Portfolio Policies<br>Reproducible artifacts released"]