HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization

ArXiv ID: 2507.18560 “View on arXiv”

Authors: Benjamin Coriat, Eric Benhamou

Abstract

This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility.

Keywords: Deep Reinforcement Learning (DRL), Hierarchical RL Agents, Sentiment Analysis Integration, Cross-Modal Data Fusion, Portfolio Optimization, Equity

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical concepts in deep reinforcement learning, hierarchical architectures, and Markowitz portfolio theory, while also presenting a backtest-ready framework with specific performance metrics (26% annualized return, 1.2 Sharpe), detailed data pipelines, and open-source reproducibility.
  flowchart TD
    A["Research Goal:<br>Integrate LLM Sentiment & DRL<br>for Portfolio Optimization"] --> B["Data Inputs:<br>Financial News & Market Data<br>(2000-2024)"]
    B --> C["HARLF Architecture"]
    C --> D["1. Base RL Agents:<br>Process hybrid data"]
    D --> E["2. Meta-Agents:<br>Aggregate decisions"]
    E --> F["3. Super-Agent:<br>Final sentiment-weighted action"]
    F --> G["Backtesting:<br>2018-2024 Period"]
    G --> H["Outcomes:<br>26% Annualized Return<br>Sharpe Ratio 1.2"]