Comparing Normalization Methods for Portfolio Optimization with Reinforcement Learning

ArXiv ID: 2508.03910 “View on arXiv”

Authors: Caio de Souza Barbosa Costa, Anna Helena Reali Costa

Abstract

Recently, reinforcement learning has achieved remarkable results in various domains, including robotics, games, natural language processing, and finance. In the financial domain, this approach has been applied to tasks such as portfolio optimization, where an agent continuously adjusts the allocation of assets within a financial portfolio to maximize profit. Numerous studies have introduced new simulation environments, neural network architectures, and training algorithms for this purpose. Among these, a domain-specific policy gradient algorithm has gained significant attention in the research community for being lightweight, fast, and for outperforming other approaches. However, recent studies have shown that this algorithm can yield inconsistent results and underperform, especially when the portfolio does not consist of cryptocurrencies. One possible explanation for this issue is that the commonly used state normalization method may cause the agent to lose critical information about the true value of the assets being traded. This paper explores this hypothesis by evaluating two of the most widely used normalization methods across three different markets (IBOVESPA, NYSE, and cryptocurrencies) and comparing them with the standard practice of normalizing data before training. The results indicate that, in this specific domain, the state normalization can indeed degrade the agent’s performance.

Keywords: Reinforcement Learning, Portfolio Optimization, State Normalization, Policy Gradient, Asset Allocation, Equities

Complexity vs Empirical Score

  • Math Complexity: 5.0/10
  • Empirical Rigor: 6.0/10
  • Quadrant: Holy Grail
  • Why: The paper presents moderate mathematical formulation in its reinforcement learning setup and portfolio equations, while demonstrating strong empirical rigor by testing multiple normalization methods across three distinct financial markets (IBOVESPA, NYSE, cryptocurrencies) and measuring performance impact.
  flowchart TD
    A["Research Goal: Evaluate<br>State Normalization Methods<br>for RL Portfolio Optimization"] --> B["Data Collection:<br>3 Markets (IBOVESPA, NYSE, Crypto)"]
    B --> C["Methodology: Compare<br>2 Normalization Methods<br>vs. Standard Pre-normalization"]
    C --> D["Computational Process:<br>Train RL Agent<br>using Policy Gradient Algorithm"]
    D --> E{"Evaluate Agent<br>Performance across Markets"}
    E --> F["Key Finding 1:<br>State Normalization Degrades<br>Agent Performance"]
    E --> G["Key Finding 2:<br>Standard Pre-normalization<br>is Superior in this Domain"]