false

Deep Reinforcement Learning for Portfolio Allocation

Deep Reinforcement Learning for Portfolio Allocation ArXiv ID: ssrn-3886804 “View on arXiv” Authors: Unknown Abstract In 2013, a paper by Google DeepMind kicked off an explosion in Deep Reinforcement Learning (DRL), for games. In this talk, we show that DRL can also be applied Keywords: Deep Reinforcement Learning, Algorithmic Trading, Artificial Intelligence, Financial Markets Complexity vs Empirical Score Math Complexity: 6.0/10 Empirical Rigor: 8.0/10 Quadrant: Holy Grail Why: The paper employs advanced mathematics (reinforcement learning, optimization, Shapley values) and demonstrates strong empirical rigor with detailed backtesting methodology, specific datasets, performance metrics, and sensitivity analysis for real-world implementation. flowchart TD Goal["Research Goal: Apply DRL to Portfolio Allocation"] --> Method["Methodology: Deep Q-Network (DQN) Algorithm"] Method --> Input["Data Inputs: Historical Price Data & Market Indicators"] Input --> Proc["Computational Process: Training Agent on Simulated Market"] Proc --> Find1["Outcome 1: Dynamic Asset Weighting"] Proc --> Find2["Outcome 2: Risk-Adjusted Return Optimization"] Find1 --> End["Conclusion: DRL Viable for Financial Markets"] Find2 --> End

January 25, 2026 · 1 min · Research Team

Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals

Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals ArXiv ID: 2512.12924 “View on arXiv” Authors: Gagan Deep, Akash Deep, William Lamptey Abstract We develop a rigorous walk-forward validation framework for algorithmic trading designed to mitigate overfitting and lookahead bias. Our methodology combines interpretable hypothesis-driven signal generation with reinforcement learning and strict out-of-sample testing. The framework enforces strict information set discipline, employs rolling window validation across 34 independent test periods, maintains complete interpretability through natural language hypothesis explanations, and incorporates realistic transaction costs and position constraints. Validating five market microstructure patterns across 100 US equities from 2015 to 2024, the system yields modest annualized returns (0.55%, Sharpe ratio 0.33) with exceptional downside protection (maximum drawdown -2.76%) and market-neutral characteristics (beta = 0.058). Performance exhibits strong regime dependence, generating positive returns during high-volatility periods (0.60% quarterly, 2020-2024) while underperforming in stable markets (-0.16%, 2015-2019). We report statistically insignificant aggregate results (p-value 0.34) to demonstrate a reproducible, honest validation protocol that prioritizes interpretability and extends naturally to advanced hypothesis generators, including large language models. The key empirical finding reveals that daily OHLCV-based microstructure signals require elevated information arrival and trading activity to function effectively. The framework provides complete mathematical specifications and open-source implementation, establishing a template for rigorous trading system evaluation that addresses the reproducibility crisis in quantitative finance research. For researchers, practitioners, and regulators, this work demonstrates that interpretable algorithmic trading strategies can be rigorously validated without sacrificing transparency or regulatory compliance. ...

December 15, 2025 · 2 min · Research Team

Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies

Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies ArXiv ID: 2512.10913 “View on arXiv” Authors: Mohammad Rezoanul Hoque, Md Meftahul Ferdaus, M. Kabir Hassan Abstract Reinforcement learning (RL) is an innovative approach to financial decision making, offering specialized solutions to complex investment problems where traditional methods fail. This review analyzes 167 articles from 2017–2025, focusing on market making, portfolio optimization, and algorithmic trading. It identifies key performance issues and challenges in RL for finance. Generally, RL offers advantages over traditional methods, particularly in market making. This study proposes a unified framework to address common concerns such as explainability, robustness, and deployment feasibility. Empirical evidence with synthetic data suggests that implementation quality and domain knowledge often outweigh algorithmic complexity. The study highlights the need for interpretable RL architectures for regulatory compliance, enhanced robustness in nonstationary environments, and standardized benchmarking protocols. Organizations should focus less on algorithm sophistication and more on market microstructure, regulatory constraints, and risk management in decision-making. ...

December 11, 2025 · 2 min · Research Team

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions ArXiv ID: 2512.02036 “View on arXiv” Authors: Juan C. King, Jose M. Amigo Abstract The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and applying advanced techniques of Machine Learning and Deep Learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates Long Short-Term Memory (LSTM) networks with algorithms based on decision trees, such as Random Forest and Gradient Boosting. While the former analyze price patterns of financial assets, the latter are fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, Random Forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables. ...

November 20, 2025 · 2 min · Research Team

3S-Trader: A Multi-LLM Framework for Adaptive Stock Scoring, Strategy, and Selection in Portfolio Optimization

3S-Trader: A Multi-LLM Framework for Adaptive Stock Scoring, Strategy, and Selection in Portfolio Optimization ArXiv ID: 2510.17393 “View on arXiv” Authors: Kefan Chen, Hussain Ahmad, Diksha Goel, Claudia Szabo Abstract Large Language Models (LLMs) have recently gained popularity in stock trading for their ability to process multimodal financial data. However, most existing methods focus on single-stock trading and lack the capacity to reason over multiple candidates for portfolio construction. Moreover, they typically lack the flexibility to revise their strategies in response to market shifts, limiting their adaptability in real-world trading. To address these challenges, we propose 3S-Trader, a training-free framework that incorporates scoring, strategy, and selection modules for stock portfolio construction. The scoring module summarizes each stock’s recent signals into a concise report covering multiple scoring dimensions, enabling efficient comparison across candidates. The strategy module analyzes historical strategies and overall market conditions to iteratively generate an optimized selection strategy. Based on this strategy, the selection module identifies and assembles a portfolio by choosing stocks with higher scores in relevant dimensions. We evaluate our framework across four distinct stock universes, including the Dow Jones Industrial Average (DJIA) constituents and three sector-specific stock sets. Compared with existing multi-LLM frameworks and time-series-based baselines, 3S-Trader achieves the highest accumulated return of 131.83% on DJIA constituents with a Sharpe ratio of 0.31 and Calmar ratio of 11.84, while also delivering consistently strong results across other sectors. ...

October 20, 2025 · 2 min · Research Team

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration ArXiv ID: 2509.25055 “View on arXiv” Authors: Binqi Chen, Hongjun Ding, Ning Shen, Jinsheng Huang, Taian Guo, Luchen Liu, Ming Zhang Abstract The automated mining of predictive signals, or alphas, is a central challenge in quantitative finance. While Reinforcement Learning (RL) has emerged as a promising paradigm for generating formulaic alphas, existing frameworks are fundamentally hampered by a triad of interconnected issues. First, they suffer from reward sparsity, where meaningful feedback is only available upon the completion of a full formula, leading to inefficient and unstable exploration. Second, they rely on semantically inadequate sequential representations of mathematical expressions, failing to capture the structure that determine an alpha’s behavior. Third, the standard RL objective of maximizing expected returns inherently drives policies towards a single optimal mode, directly contradicting the practical need for a diverse portfolio of non-correlated alphas. To overcome these challenges, we introduce AlphaSAGE (Structure-Aware Alpha Mining via Generative Flow Networks for Robust Exploration), a novel framework is built upon three cornerstone innovations: (1) a structure-aware encoder based on Relational Graph Convolutional Network (RGCN); (2) a new framework with Generative Flow Networks (GFlowNets); and (3) a dense, multi-faceted reward structure. Empirical results demonstrate that AlphaSAGE outperforms existing baselines in mining a more diverse, novel, and highly predictive portfolio of alphas, thereby proposing a new paradigm for automated alpha mining. Our code is available at https://github.com/BerkinChen/AlphaSAGE. ...

September 29, 2025 · 2 min · Research Team

Enhanced fill probability estimates in institutional algorithmic bond trading using statistical learning algorithms with quantum computers

Enhanced fill probability estimates in institutional algorithmic bond trading using statistical learning algorithms with quantum computers ArXiv ID: 2509.17715 “View on arXiv” Authors: Axel Ciceri, Austin Cottrell, Joshua Freeland, Daniel Fry, Hirotoshi Hirai, Philip Intallura, Hwajung Kang, Chee-Kong Lee, Abhijit Mitra, Kentaro Ohno, Das Pemmaraju, Manuel Proissl, Brian Quanz, Del Rajan, Noriaki Shimada, Kavitha Yograj Abstract The estimation of fill probabilities for trade orders represents a key ingredient in the optimization of algorithmic trading strategies. It is bound by the complex dynamics of financial markets with inherent uncertainties, and the limitations of models aiming to learn from multivariate financial time series that often exhibit stochastic properties with hidden temporal patterns. In this paper, we focus on algorithmic responses to trade inquiries in the corporate bond market and investigate fill probability estimation errors of common machine learning models when given real production-scale intraday trade event data, transformed by a quantum algorithm running on IBM Heron processors, as well as on noiseless quantum simulators for comparison. We introduce a framework to embed these quantum-generated data transforms as a decoupled offline component that can be selectively queried by models in low-latency institutional trade optimization settings. A trade execution backtesting method is employed to evaluate the fill prediction performance of these models in relation to their input data. We observe a relative gain of up to ~ 34% in out-of-sample test scores for those models with access to quantum hardware-transformed data over those using the original trading data or transforms by noiseless quantum simulation. These empirical results suggest that the inherent noise in current quantum hardware contributes to this effect and motivates further studies. Our work demonstrates the emerging potential of quantum computing as a complementary explorative tool in quantitative finance and encourages applied industry research towards practical applications in trading. ...

September 22, 2025 · 3 min · Research Team

Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning

Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning ArXiv ID: 2509.11420 “View on arXiv” Authors: Yijia Xiao, Edward Sun, Tong Chen, Fang Wu, Di Luo, Wei Wang Abstract Developing professional, structured reasoning on par with human financial analysts and traders remains a central challenge in AI for finance, where markets demand interpretability and trust. Traditional time-series models lack explainability, while LLMs face challenges in turning natural-language analysis into disciplined, executable trades. Although reasoning LLMs have advanced in step-by-step planning and verification, their application to risk-sensitive financial decisions is underexplored. We present Trading-R1, a financially-aware model that incorporates strategic thinking and planning for comprehensive thesis composition, facts-grounded analysis, and volatility-adjusted decision making. Trading-R1 aligns reasoning with trading principles through supervised fine-tuning and reinforcement learning with a three-stage easy-to-hard curriculum. Training uses Tauric-TR1-DB, a 100k-sample corpus spanning 18 months, 14 equities, and five heterogeneous financial data sources. Evaluated on six major equities and ETFs, Trading-R1 demonstrates improved risk-adjusted returns and lower drawdowns compared to both open-source and proprietary instruction-following models as well as reasoning models. The system generates structured, evidence-based investment theses that support disciplined and interpretable trading decisions. Trading-R1 Terminal will be released at https://github.com/TauricResearch/Trading-R1. ...

September 14, 2025 · 2 min · Research Team

Ultrafast Extreme Events: Empirical Analysis of Mechanisms and Recovery in a Historical Perspective

Ultrafast Extreme Events: Empirical Analysis of Mechanisms and Recovery in a Historical Perspective ArXiv ID: 2509.10376 “View on arXiv” Authors: Luca Henrichs, Anton J. Heckens, Thomas Guhr Abstract To understand the emergence of Ultrafast Extreme Events (UEEs), the influence of algorithmic trading or high-frequency traders is of major interest as they make it extremely difficult to intervene and to stabilize financial markets. In an empirical analysis, we compare various characteristics of UEEs over different years for the US stock market to assess the possible non-stationarity of the effects. We show that liquidity plays a dominant role in the emergence of UEEs and find a general pattern in their dynamics. We also empirically investigate the after-effects in view of the recovery rate. We find common patterns for different years. We explain changes in the recovery rate by varying market sentiments for the different years. ...

September 12, 2025 · 2 min · Research Team

Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading

Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading ArXiv ID: 2509.01393 “View on arXiv” Authors: Qizhao Chen, Hiroaki Kawashima Abstract This paper proposes a reinforcement learning framework that employs Proximal Policy Optimization (PPO) to dynamically optimize the weights of multiple large language model (LLM)-generated formulaic alphas for stock trading strategies. Formulaic alphas are mathematically defined trading signals derived from price, volume, sentiment, and other data. Although recent studies have shown that LLMs can generate diverse and effective alphas, a critical challenge lies in how to adaptively integrate them under varying market conditions. To address this gap, we leverage the deepseek-r1-distill-llama-70b model to generate fifty alphas for five major stocks: Apple, HSBC, Pepsi, Toyota, and Tencent, and then use PPO to adjust their weights in real time. Experimental results demonstrate that the PPO-optimized strategy achieves strong returns and high Sharpe ratios across most stocks, outperforming both an equal-weighted alpha portfolio and traditional benchmarks such as the Nikkei 225, S&P 500, and Hang Seng Index. The findings highlight the importance of reinforcement learning in the allocation of alpha weights and show the potential of combining LLM-generated signals with adaptive optimization for robust financial forecasting and trading. ...

September 1, 2025 · 2 min · Research Team