Reinforcement Learning

ContestTrade: A Multi-Agent Trading System Based on Internal Contest Mechanism

ContestTrade: A Multi-Agent Trading System Based on Internal Contest Mechanism ArXiv ID: 2508.00554 “View on arXiv” Authors: Li Zhao, Rui Sun, Zuoyou Jiang, Bo Yang, Yuxiao Bai, Mengting Chen, Xinyang Wang, Jing Li, Zuo Bai Abstract In financial trading, large language model (LLM)-based agents demonstrate significant potential. However, the high sensitivity to market noise undermines the performance of LLM-based trading systems. To address this limitation, we propose a novel multi-agent system featuring an internal competitive mechanism inspired by modern corporate management structures. The system consists of two specialized teams: (1) Data Team - responsible for processing and condensing massive market data into diversified text factors, ensuring they fit the model’s constrained context. (2) Research Team - tasked with making parallelized multipath trading decisions based on deep research methods. The core innovation lies in implementing a real-time evaluation and ranking mechanism within each team, driven by authentic market feedback. Each agent’s performance undergoes continuous scoring and ranking, with only outputs from top-performing agents being adopted. The design enables the system to adaptively adjust to dynamic environment, enhances robustness against market noise and ultimately delivers superior trading performance. Experimental results demonstrate that our proposed system significantly outperforms prevailing multi-agent systems and traditional quantitative investment methods across diverse evaluation metrics. ContestTrade is open-sourced on GitHub at https://github.com/FinStep-AI/ContestTrade. ...

Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading

Technical Indicator Networks (TINs): An Interpretable Neural Architecture Modernizing Classic al Technical Analysis for Adaptive Algorithmic Trading ArXiv ID: 2507.20202 “View on arXiv” Authors: Longfei Lu Abstract Deep neural networks (DNNs) have transformed fields such as computer vision and natural language processing by employing architectures aligned with domain-specific structural patterns. In algorithmic trading, however, there remains a lack of architectures that directly incorporate the logic of traditional technical indicators. This study introduces Technical Indicator Networks (TINs), a structured neural design that reformulates rule-based financial heuristics into trainable and interpretable modules. The architecture preserves the core mathematical definitions of conventional indicators while extending them to multidimensional data and supporting optimization through diverse learning paradigms, including reinforcement learning. Analytical transformations such as averaging, clipping, and ratio computation are expressed as vectorized layer operators, enabling transparent network construction and principled initialization. This formulation retains the clarity and interpretability of classical strategies while allowing adaptive adjustment and data-driven refinement. As a proof of concept, the framework is validated on the Dow Jones Industrial Average constituents using a Moving Average Convergence Divergence (MACD) TIN. Empirical results demonstrate improved risk-adjusted performance relative to traditional indicator-based strategies. Overall, the findings suggest that TINs provide a generalizable foundation for interpretable, adaptive, and extensible learning architectures in structured decision-making domains and indicate substantial commercial potential for upgrading trading platforms with cross-market visibility and enhanced decision-support capabilities. ...

MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading

MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading ArXiv ID: 2507.20474 “View on arXiv” Authors: Siyi Wu, Junqiao Wang, Zhaoyang Guan, Leyi Zhao, Xinyuan Song, Xinyu Ying, Dexu Yu, Jinhao Wang, Hanlin Zhang, Michele Pak, Yangfan He, Yi Xin, Jianhui Wang, Tianyu Shi Abstract Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{“MountainLion”}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence. ...

Reinforcement Learning for Trade Execution with Market Impact

Reinforcement Learning for Trade Execution with Market Impact ArXiv ID: 2507.06345 “View on arXiv” Authors: Patrick Cheridito, Moritz Weiss Abstract In this paper, we introduce a novel reinforcement learning framework for optimal trade execution in a limit order book. We formulate the trade execution problem as a dynamic allocation task whose objective is the optimal placement of market and limit orders to maximize expected revenue. By employing multivariate logistic-normal distributions to model random allocations, the framework enables efficient training of the reinforcement learning algorithm. Numerical experiments show that the proposed method outperforms traditional benchmark strategies in simulated limit order book environments featuring noise traders submitting random orders, tactical traders responding to order book imbalances, and a strategic trader seeking to acquire or liquidate an asset position. ...

Accelerated Portfolio Optimization and Option Pricing with Reinforcement Learning

Accelerated Portfolio Optimization and Option Pricing with Reinforcement Learning ArXiv ID: 2507.01972 “View on arXiv” Authors: Hadi Keramati, Samaneh Jazayeri Abstract We present a reinforcement learning (RL)-driven framework for optimizing block-preconditioner sizes in iterative solvers used in portfolio optimization and option pricing. The covariance matrix in portfolio optimization or the discretization of differential operators in option pricing models lead to large linear systems of the form $\mathbf{“A”}\textbf{“x”}=\textbf{“b”}$. Direct inversion of high-dimensional portfolio or fine-grid option pricing incurs a significant computational cost. Therefore, iterative methods are usually used for portfolios in real-world situations. Ill-conditioned systems, however, suffer from slow convergence. Traditional preconditioning techniques often require problem-specific parameter tuning. To overcome this limitation, we rely on RL to dynamically adjust the block-preconditioner sizes and accelerate iterative solver convergence. Evaluations on a suite of real-world portfolio optimization matrices demonstrate that our RL framework can be used to adjust preconditioning and significantly accelerate convergence and reduce computational cost. The proposed accelerated solver supports faster decision-making in dynamic portfolio allocation and real-time option pricing. ...

Model-Free Deep Hedging with Transaction Costs and Light Data Requirements

Model-Free Deep Hedging with Transaction Costs and Light Data Requirements ArXiv ID: 2505.22836 “View on arXiv” Authors: Pierre Brugière, Gabriel Turinici Abstract Option pricing theory, such as the Black and Scholes (1973) model, provides an explicit solution to construct a strategy that perfectly hedges an option in a continuous-time setting. In practice, however, trading occurs in discrete time and often involves transaction costs, making the direct application of continuous-time solutions potentially suboptimal. Previous studies, such as those by Buehler et al. (2018), Buehler et al. (2019) and Cao et al. (2019), have shown that deep learning or reinforcement learning can be used to derive better hedging strategies than those based on continuous-time models. However, these approaches typically rely on a large number of trajectories (of the order of $10^5$ or $10^6$) to train the model. In this work, we show that using as few as 256 trajectories is sufficient to train a neural network that significantly outperforms, in the Geometric Brownian Motion framework, both the classical Black & Scholes formula and the Leland model, which is arguably one of the most effective explicit alternatives for incorporating transaction costs. The ability to train neural networks with such a small number of trajectories suggests the potential for more practical and simple implementation on real-time financial series. ...

Hedging with Sparse Reward Reinforcement Learning

Hedging with Sparse Reward Reinforcement Learning ArXiv ID: 2503.04218 “View on arXiv” Authors: Unknown Abstract Derivatives, as a critical class of financial instruments, isolate and trade the price attributes of risk assets such as stocks, commodities, and indices, aiding risk management and enhancing market efficiency. However, traditional hedging models, constrained by assumptions such as continuous trading and zero transaction costs, fail to satisfy risk control requirements in complex and uncertain real-world markets. With advances in computing technology and deep learning, data-driven trading strategies are becoming increasingly prevalent. This thesis proposes a derivatives hedging framework integrating deep learning and reinforcement learning. The framework comprises a probabilistic forecasting model and a hedging agent, enabling market probability prediction, derivative pricing, and hedging. Specifically, we design a spatiotemporal attention-based probabilistic financial time series forecasting Transformer to address the scarcity of derivatives hedging data. A low-rank attention mechanism compresses high-dimensional assets into a low-dimensional latent space, capturing nonlinear asset relationships. The Transformer models sequential dependencies within this latent space, improving market probability forecasts and constructing an online training environment for downstream hedging tasks. Additionally, we incorporate generalized geometric Brownian motion to develop a risk-neutral pricing approach for derivatives. We model derivatives hedging as a reinforcement learning problem with sparse rewards and propose a behavior cloning-based recurrent proximal policy optimization (BC-RPPO) algorithm. This pretraining-finetuning framework significantly enhances the hedging agent’s performance. Numerical experiments in the U.S. and Chinese financial markets demonstrate our method’s superiority over traditional approaches. ...

To Hedge or Not to Hedge: Optimal Strategies for Stochastic Trade Flow Management

To Hedge or Not to Hedge: Optimal Strategies for Stochastic Trade Flow Management ArXiv ID: 2503.02496 “View on arXiv” Authors: Unknown Abstract This paper addresses the trade-off between internalisation and externalisation in the management of stochastic trade flows. We consider agents who must absorb flows and manage risk by deciding whether to warehouse it or hedge in the market, thereby incurring transaction costs and market impact. Unlike market makers, these agents cannot skew their quotes to attract offsetting flows and deter risk-increasing ones, leading to a fundamentally different problem. Within the Almgren-Chriss framework, we derive almost-closed-form solutions in the case of quadratic execution costs, while more general cases require numerical methods. In particular, we discuss the challenges posed by artificial boundary conditions when using classical grid-based numerical PDE techniques and propose reinforcement learning methods as an alternative. ...

Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents

Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents ArXiv ID: 2502.17967 “View on arXiv” Authors: Unknown Abstract Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks, yet their performance in dynamic, real-world financial environments remains underexplored. Existing approaches are limited to historical backtesting, where trading actions cannot influence market prices and agents train only on static data. To address this limitation, we present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading and directly impact price dynamics. By simulating realistic bid-ask interactions, our platform enables training in scenarios that closely mirror live markets, thereby narrowing the gap between training and evaluation. Experiments reveal that LLMs struggle with numerical reasoning when given plain-text data, often overfitting to local patterns and recent values. In contrast, chart-based visualizations significantly enhance both numerical reasoning and trading performance. Furthermore, incorporating a reflection module yields additional improvements, especially with visual inputs. Evaluations on NASDAQ and CSI datasets demonstrate the superiority of our method, particularly under high volatility. All code and data are available at https://github.com/wekjsdvnm/Agent-Trading-Arena. ...

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents ArXiv ID: 2502.07393 “View on arXiv” Authors: Unknown Abstract This paper presents a novel risk-sensitive trading agent combining reinforcement learning and large language models (LLMs). We extend the Conditional Value-at-Risk Proximal Policy Optimization (CPPO) algorithm, by adding risk assessment and trading recommendation signals generated by a LLM from financial news. Our approach is backtested on the Nasdaq-100 index benchmark, using financial news data from the FNSPID dataset and the DeepSeek V3, Qwen 2.5 and Llama 3.3 language models. The code, data, and trading agents are available at: https://github.com/benstaf/FinRL_DeepSeek ...