News-Aware Direct Reinforcement Trading for Financial Markets

ArXiv ID: 2510.19173 “View on arXiv”

Authors: Qing-Yu Lan, Zhan-He Wang, Jun-Qian Jiang, Yu-Tong Wang, Yun-Song Piao

Abstract

The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process.

Keywords: reinforcement learning, transformers, news sentiment, Double Deep Q-Network, sequence models, Cryptocurrency

Complexity vs Empirical Score

  • Math Complexity: 6.5/10
  • Empirical Rigor: 4.0/10
  • Quadrant: Lab Rats
  • Why: The paper employs advanced sequence models (RNNs, Transformers) and specific RL algorithms (DDQN, GRPO) for time-series processing, constituting moderate to high mathematical complexity. However, the methodology is described conceptually without code, specific dataset details, or statistical performance metrics, suggesting it is more of a theoretical framework than a backtest-ready implementation.
  flowchart TD
    A["Research Goal:<br>Incorporate news into<br>trading via RL"] --> B{"Data Collection"}
    B --> C["Raw Market Data<br>(Price & Volume)"]
    B --> D["News Data"]
    D --> E["LLM Sentiment Scoring"]
    E --> F["News-Aware Inputs"]
    C --> F
    F --> G{"Reinforcement Learning<br>Process"}
    G --> H["Algorithm: DDQN & GRPO<br>using Transformers/RNNs"]
    H --> I["End-to-End Trading<br>Decisions"]
    I --> J["Key Outcomes"]
    J --> K["Superior Performance<br>vs Benchmarks"]
    J --> L["No Handcrafted<br>Features Required"]
    J --> M["Time-Series<br>Critical"]