Deep Reinforcement Learning

Deep Reinforcement Learning for Portfolio Allocation

Deep Reinforcement Learning for Portfolio Allocation ArXiv ID: ssrn-3886804 “View on arXiv” Authors: Unknown Abstract In 2013, a paper by Google DeepMind kicked off an explosion in Deep Reinforcement Learning (DRL), for games. In this talk, we show that DRL can also be applied Keywords: Deep Reinforcement Learning, Algorithmic Trading, Artificial Intelligence, Financial Markets Complexity vs Empirical Score Math Complexity: 6.0/10 Empirical Rigor: 8.0/10 Quadrant: Holy Grail Why: The paper employs advanced mathematics (reinforcement learning, optimization, Shapley values) and demonstrates strong empirical rigor with detailed backtesting methodology, specific datasets, performance metrics, and sensitivity analysis for real-world implementation. flowchart TD Goal["Research Goal: Apply DRL to Portfolio Allocation"] --> Method["Methodology: Deep Q-Network (DQN) Algorithm"] Method --> Input["Data Inputs: Historical Price Data & Market Indicators"] Input --> Proc["Computational Process: Training Agent on Simulated Market"] Proc --> Find1["Outcome 1: Dynamic Asset Weighting"] Proc --> Find2["Outcome 2: Risk-Adjusted Return Optimization"] Find1 --> End["Conclusion: DRL Viable for Financial Markets"] Find2 --> End

The Red Queen's Trap: Limits of Deep Evolution in High-Frequency Trading

The Red Queen’s Trap: Limits of Deep Evolution in High-Frequency Trading ArXiv ID: 2512.15732 “View on arXiv” Authors: Yijia Chen Abstract The integration of Deep Reinforcement Learning (DRL) and Evolutionary Computation (EC) is frequently hypothesized to be the “Holy Grail” of algorithmic trading, promising systems that adapt autonomously to non-stationary market regimes. This paper presents a rigorous post-mortem analysis of “Galaxy Empire,” a hybrid framework coupling LSTM/Transformer-based perception with a genetic “Time-is-Life” survival mechanism. Deploying a population of 500 autonomous agents in a high-frequency cryptocurrency environment, we observed a catastrophic divergence between training metrics (Validation APY $>300%$) and live performance (Capital Decay $>70%$). We deconstruct this failure through a multi-disciplinary lens, identifying three critical failure modes: the overfitting of \textit{“Aleatoric Uncertainty”} in low-entropy time-series, the \textit{“Survivor Bias”} inherent in evolutionary selection under high variance, and the mathematical impossibility of overcoming microstructure friction without order-flow data. Our findings provide empirical evidence that increasing model complexity in the absence of information asymmetry exacerbates systemic fragility. ...

Adaptive Dueling Double Deep Q-networks in Uniswap V3 Replication and Extension with Mamba

Adaptive Dueling Double Deep Q-networks in Uniswap V3 Replication and Extension with Mamba ArXiv ID: 2511.22101 “View on arXiv” Authors: Zhaofeng Zhang Abstract The report goes through the main steps of replicating and improving the article “Adaptive Liquidity Provision in Uniswap V3 with Deep Reinforcement Learning.” The replication part includes how to obtain data from the Uniswap Subgraph, details of the implementation, and comments on the results. After the replication, I propose a new structure based on the original model, which combines Mamba with DDQN and a new reward function. In this new structure, I clean the data again and introduce two new baselines for comparison. As a result, although the model has not yet been applied to all datasets, it shows stronger theoretical support than the original model and performs better in some tests. ...

DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection

DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection ArXiv ID: 2510.14985 “View on arXiv” Authors: Jinkyu Kim, Hyunjung Yi, Mogan Gim, Donghee Choi, Jaewoo Kang Abstract We propose DeepAries , a novel deep reinforcement learning framework for dynamic portfolio management that jointly optimizes the timing and allocation of rebalancing decisions. Unlike prior reinforcement learning methods that employ fixed rebalancing intervals regardless of market conditions, DeepAries adaptively selects optimal rebalancing intervals along with portfolio weights to reduce unnecessary transaction costs and maximize risk-adjusted returns. Our framework integrates a Transformer-based state encoder, which effectively captures complex long-term market dependencies, with Proximal Policy Optimization (PPO) to generate simultaneous discrete (rebalancing intervals) and continuous (asset allocations) actions. Extensive experiments on multiple real-world financial markets demonstrate that DeepAries significantly outperforms traditional fixed-frequency and full-rebalancing strategies in terms of risk-adjusted returns, transaction costs, and drawdowns. Additionally, we provide a live demo of DeepAries at https://deep-aries.github.io/, along with the source code and dataset at https://github.com/dmis-lab/DeepAries, illustrating DeepAries’ capability to produce interpretable rebalancing and allocation decisions aligned with shifting market regimes. Overall, DeepAries introduces an innovative paradigm for adaptive and practical portfolio management by integrating both timing and allocation into a unified decision-making process. ...

Can Artificial Intelligence Trade the Stock Market?

Can Artificial Intelligence Trade the Stock Market? ArXiv ID: 2506.04658 “View on arXiv” Authors: Jędrzej Maskiewicz, Paweł Sakowski Abstract The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL’s effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns. ...

Deep Learning for Continuous-time Stochastic Control with Jumps

Deep Learning for Continuous-time Stochastic Control with Jumps ArXiv ID: 2505.15602 “View on arXiv” Authors: Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut Abstract In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks. ...

Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models

Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models ArXiv ID: 2504.16635 “View on arXiv” Authors: Fredy Pokou, Jules Sadefo Kamdem, François Benhmad Abstract In an environment of increasingly volatile financial markets, the accurate estimation of risk remains a major challenge. Traditional econometric models, such as GARCH and its variants, are based on assumptions that are often too rigid to adapt to the complexity of the current market dynamics. To overcome these limitations, we propose a hybrid framework for Value-at-Risk (VaR) estimation, combining GARCH volatility models with deep reinforcement learning. Our approach incorporates directional market forecasting using the Double Deep Q-Network (DDQN) model, treating the task as an imbalanced classification problem. This architecture enables the dynamic adjustment of risk-level forecasts according to market conditions. Empirical validation on daily Eurostoxx 50 data covering periods of crisis and high volatility shows a significant improvement in the accuracy of VaR estimates, as well as a reduction in the number of breaches and also in capital requirements, while respecting regulatory risk thresholds. The ability of the model to adjust risk levels in real time reinforces its relevance to modern and proactive risk management. ...

Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach

Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach ArXiv ID: 2505.03760 “View on arXiv” Authors: Unknown Abstract Portfolio optimization requires dynamic allocation of funds by balancing the risk and return tradeoff under dynamic market conditions. With the recent advancements in AI, Deep Reinforcement Learning (DRL) has gained prominence in providing adaptive and scalable strategies for portfolio optimization. However, the success of these strategies depends not only on their ability to adapt to market dynamics but also on the careful pre-selection of assets that influence overall portfolio performance. Incorporating the investor’s preference in pre-selecting assets for a portfolio is essential in refining their investment strategies. This study proposes a volatility-guided DRL-based portfolio optimization framework that dynamically constructs portfolios based on investors’ risk profiles. The Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model is utilized for volatility forecasting of stocks and categorizes them based on their volatility as aggressive, moderate, and conservative. The DRL agent is then employed to learn an optimal investment policy by interacting with the historical market data. The efficacy of the proposed methodology is established using stocks from the Dow $30$ index. The proposed investor-specific DRL-based portfolios outperformed the baseline strategies by generating consistent risk-adjusted returns. ...

Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making

Event-Based Limit Order Book Simulation under a Neural Hawkes Process: Application in Market-Making ArXiv ID: 2502.17417 “View on arXiv” Authors: Unknown Abstract In this paper, we propose an event-driven Limit Order Book (LOB) model that captures twelve of the most observed LOB events in exchange-based financial markets. To model these events, we propose using the state-of-the-art Neural Hawkes process, a more robust alternative to traditional Hawkes process models. More specifically, this model captures the dynamic relationships between different event types, particularly their long- and short-term interactions, using a Long Short-Term Memory neural network. Using this framework, we construct a midprice process that captures the event-driven behavior of the LOB by simulating high-frequency dynamics like how they appear in real financial markets. The empirical results show that our model captures many of the broader characteristics of the price fluctuations, particularly in terms of their overall volatility. We apply this LOB simulation model within a Deep Reinforcement Learning Market-Making framework, where the trading agent can now complete trade order fills in a manner that closely resembles real-market trade execution. Here, we also compare the results of the simulated model with those from real data, highlighting how the overall performance and the distribution of trade order fills closely align with the same analysis on real data. ...

A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market

A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China’s Stock Market ArXiv ID: 2412.18563 “View on arXiv” Authors: Unknown Abstract Artificial intelligence is transforming financial investment decision-making frameworks, with deep reinforcement learning demonstrating substantial potential in robo-advisory applications. This paper addresses the limitations of traditional portfolio optimization methods in dynamic asset weight adjustment through the development of a deep reinforcement learning-based dynamic optimization model grounded in practical trading processes. The research advances two key innovations: first, the introduction of a novel Sharpe ratio reward function engineered for Actor-Critic deep reinforcement learning algorithms, which ensures stable convergence during training while consistently achieving positive average Sharpe ratios; second, the development of an innovative comprehensive approach to portfolio optimization utilizing deep reinforcement learning, which significantly enhances model optimization capability through the integration of random sampling strategies during training with image-based deep neural network architectures for multi-dimensional financial time series data processing, average Sharpe ratio reward functions, and deep reinforcement learning algorithms. The empirical analysis validates the model using randomly selected constituent stocks from the CSI 300 Index, benchmarking against established financial econometric optimization models. Backtesting results demonstrate the model’s efficacy in optimizing portfolio allocation and mitigating investment risk, yielding superior comprehensive performance metrics. ...