Multimodal Deep Reinforcement Learning for Portfolio Optimization

ArXiv ID: 2412.17293 “View on arXiv”

Authors: Unknown

Abstract

We propose a reinforcement learning (RL) framework that leverages multimodal data including historical stock prices, sentiment analysis, and topic embeddings from news articles, to optimize trading strategies for SP100 stocks. Building upon recent advancements in financial reinforcement learning, we aim to enhance the state space representation by integrating financial sentiment data from SEC filings and news headlines and refining the reward function to better align with portfolio performance metrics. Our methodology includes deep reinforcement learning with state tensors comprising price data, sentiment scores, and news embeddings, processed through advanced feature extraction models like CNNs and RNNs. By benchmarking against traditional portfolio optimization techniques and advanced strategies, we demonstrate the efficacy of our approach in delivering superior portfolio performance. Empirical results showcase the potential of our agent to outperform standard benchmarks, especially when utilizing combined data sources under profit-based reward functions.

Keywords: Reinforcement Learning, Multimodal Data, Sentiment Analysis, CNN, RNN, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematical concepts including Markov Decision Processes, deep neural networks, and tensor-based feature extraction, while also detailing comprehensive data sourcing, preprocessing pipelines, and empirical benchmarking against traditional methods.
  flowchart TD
    A["Research Goal:<br>Optimize SP100 Trading via RL<br>with Multimodal Data"] --> B["Data Inputs & Processing"]
    subgraph B ["Inputs"]
        B1["Price History"]
        B2["News Sentiment"]
        B3["SEC Filings"]
    end
    B --> C["Deep RL Agent Construction"]
    C --> D["Feature Extraction:<br>CNN/RNN on State Tensors"]
    D --> E["Policy Learning:<br>Profit-based Reward Function"]
    E --> F["Execution & Backtesting"]
    F --> G["Key Findings:<br>Outperforms Benchmarks<br>Synergy of Data Sources"]