false

Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies

Reinforcement Learning in Financial Decision Making: A Systematic Review of Performance, Challenges, and Implementation Strategies ArXiv ID: 2512.10913 “View on arXiv” Authors: Mohammad Rezoanul Hoque, Md Meftahul Ferdaus, M. Kabir Hassan Abstract Reinforcement learning (RL) is an innovative approach to financial decision making, offering specialized solutions to complex investment problems where traditional methods fail. This review analyzes 167 articles from 2017–2025, focusing on market making, portfolio optimization, and algorithmic trading. It identifies key performance issues and challenges in RL for finance. Generally, RL offers advantages over traditional methods, particularly in market making. This study proposes a unified framework to address common concerns such as explainability, robustness, and deployment feasibility. Empirical evidence with synthetic data suggests that implementation quality and domain knowledge often outweigh algorithmic complexity. The study highlights the need for interpretable RL architectures for regulatory compliance, enhanced robustness in nonstationary environments, and standardized benchmarking protocols. Organizations should focus less on algorithm sophistication and more on market microstructure, regulatory constraints, and risk management in decision-making. ...

December 11, 2025 · 2 min · Research Team

FX Market Making with Internal Liquidity

FX Market Making with Internal Liquidity ArXiv ID: 2512.04603 “View on arXiv” Authors: Alexander Barzykin, Robert Boyce, Eyal Neuman Abstract As the FX markets continue to evolve, many institutions have started offering passive access to their internal liquidity pools. Market makers act as principal and have the opportunity to fill those orders as part of their risk management, or they may choose to adjust pricing to their external OTC franchise to facilitate the matching flow. It is, a priori, unclear how the strategies managing internal liquidity should depend on market condions, the market maker’s risk appetite, and the placement algorithms deployed by participating clients. The market maker’s actions in the presence of passive orders are relevant not only for their own objectives, but also for those liquidity providers who have certain expectations of the execution speed. In this work, we investigate the optimal multi-objective strategy of a market maker with an option to take liquidity on an internal exchange, and draw important qualitative insights for real-world trading. ...

December 4, 2025 · 2 min · Research Team

Option market making with hedging-induced market impact

Option market making with hedging-induced market impact ArXiv ID: 2511.02518 “View on arXiv” Authors: Paulin Aubert, Etienne Chevalier, Vathana Ly Vath Abstract This paper develops a model for option market making in which the hedging activity of the market maker generates price impact on the underlying asset. The option order flow is modeled by Cox processes, with intensities depending on the state of the underlying and on the market maker’s quoted prices. The resulting dynamics combine stochastic option demand with both permanent and transient impact on the underlying, leading to a coupled evolution of inventory and price. We first study market manipulation and arbitrage phenomena that may arise from the feedback between option trading and underlying impact. We then establish the well-posedness of the mixed control problem, which involves continuous quoting decisions and impulsive hedging actions. Finally, we implement a numerical method based on policy optimization to approximate optimal strategies and illustrate the interplay between option market liquidity, inventory risk, and underlying impact. ...

November 4, 2025 · 2 min · Research Team

JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading

JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading ArXiv ID: 2511.02136 “View on arXiv” Authors: Valentin Mohl, Sascha Frey, Reuben Leyland, Kang Li, George Nigmatulin, Mihai Cucuringu, Stefan Zohren, Jakob Foerster, Anisoara Calinescu Abstract Agent-based modelling (ABM) approaches for high-frequency financial markets are difficult to calibrate and validate, partly due to the large parameter space created by defining fixed agent policies. Multi-agent reinforcement learning (MARL) enables more realistic agent behaviour and reduces the number of free parameters, but the heavy computational cost has so far limited research efforts. To address this, we introduce JaxMARL-HFT (JAX-based Multi-Agent Reinforcement Learning for High-Frequency Trading), the first GPU-accelerated open-source multi-agent reinforcement learning environment for high-frequency trading (HFT) on market-by-order (MBO) data. Extending the JaxMARL framework and building on the JAX-LOB implementation, JaxMARL-HFT is designed to handle a heterogeneous set of agents, enabling diverse observation/action spaces and reward functions. It is designed flexibly, so it can also be used for single-agent RL, or extended to act as an ABM with fixed-policy agents. Leveraging JAX enables up to a 240x reduction in end-to-end training time, compared with state-of-the-art reference implementations on the same hardware. This significant speed-up makes it feasible to exploit the large, granular datasets available in high-frequency trading, and to perform the extensive hyperparameter sweeps required for robust and efficient MARL research in trading. We demonstrate the use of JaxMARL-HFT with independent Proximal Policy Optimization (IPPO) for a two-player environment, with an order execution and a market making agent, using one year of LOB data (400 million orders), and show that these agents learn to outperform standard benchmarks. The code for the JaxMARL-HFT framework is available on GitHub. ...

November 3, 2025 · 2 min · Research Team

When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making ArXiv ID: 2510.27334 “View on arXiv” Authors: Ali Raza Jafree, Konark Jain, Nick Firoozye Abstract We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent. ...

October 31, 2025 · 2 min · Research Team

An Impulse Control Approach to Market Making in a Hawkes LOB Market

An Impulse Control Approach to Market Making in a Hawkes LOB Market ArXiv ID: 2510.26438 “View on arXiv” Authors: Konark Jain, Nick Firoozye, Jonathan Kochems, Philip Treleaven Abstract We study the optimal Market Making problem in a Limit Order Book (LOB) market simulated using a high-fidelity, mutually exciting Hawkes process. Departing from traditional Brownian-driven mid-price models, our setup captures key microstructural properties such as queue dynamics, inter-arrival clustering, and endogenous price impact. Recognizing the realistic constraint that market makers cannot update strategies at every LOB event, we formulate the control problem within an impulse control framework, where interventions occur discretely via limit, cancel, or market orders. This leads to a high-dimensional, non-local Hamilton-Jacobi-Bellman Quasi-Variational Inequality (HJB-QVI), whose solution is analytically intractable and computationally expensive due to the curse of dimensionality. To address this, we propose a novel Reinforcement Learning (RL) approximation inspired by auxiliary control formulations. Using a two-network PPO-based architecture with self-imitation learning, we demonstrate strong empirical performance with limited training, achieving Sharpe ratios above 30 in a realistic simulated LOB. In addition to that, we solve the HJB-QVI using a deep learning method inspired by Sirignano and Spiliopoulos 2018 and compare the performance with the RL agent. Our findings highlight the promise of combining impulse control theory with modern deep RL to tackle optimal execution problems in jump-driven microstructural markets. ...

October 30, 2025 · 2 min · Research Team

Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics

Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics ArXiv ID: 2509.12456 “View on arXiv” Authors: Rafael Zimmer, Oswaldo Luiz do Valle Costa Abstract Reinforcement Learning has emerged as a promising framework for developing adaptive and data-driven strategies, enabling market makers to optimize decision-making policies based on interactions with the limit order book environment. This paper explores the integration of a reinforcement learning agent in a market-making context, where the underlying market dynamics have been explicitly modeled to capture observed stylized facts of real markets, including clustered order arrival times, non-stationary spreads and return drifts, stochastic order quantities and price volatility. These mechanisms aim to enhance stability of the resulting control agent, and serve to incorporate domain-specific knowledge into the agent policy learning process. Our contributions include a practical implementation of a market making agent based on the Proximal-Policy Optimization (PPO) algorithm, alongside a comparative evaluation of the agent’s performance under varying market conditions via a simulator-based environment. As evidenced by our analysis of the financial return and risk metrics when compared to a closed-form optimal solution, our results suggest that the reinforcement learning agent can effectively be used under non-stationary market conditions, and that the proposed simulator-based environment can serve as a valuable tool for training and pre-training reinforcement learning agents in market-making scenarios. ...

September 15, 2025 · 2 min · Research Team

Competition and Incentives in a Shared Order Book

Competition and Incentives in a Shared Order Book ArXiv ID: 2509.10094 “View on arXiv” Authors: René Aïd, Philippe Bergault, Mathieu Rosenbaum Abstract Recent regulation on intraday electricity markets has led to the development of shared order books with the intention to foster competition and increase market liquidity. In this paper, we address the question of the efficiency of such regulations by analysing the situation of two exchanges sharing a single limit order book, i.e. a quote by a market maker can be hit by a trade arriving on the other exchange. We develop a Principal-Agent model where each exchange acts as the Principal of her own market maker acting as her Agent. Exchanges and market makers have all CARA utility functions with potentially different risk-aversion parameters. In terms of mathematical result, we show existence and uniqueness of the resulting Nash equilibrium between exchanges, give the optimal incentive contracts and provide numerical solution to the PDE satisfied by the certainty equivalent of the exchanges. From an economic standpoint, our model demonstrates that incentive provision constitutes a public good. More precisely, it highlights the presence of a competitiveness spillover effect: when one exchange optimally incentivizes its market maker, the competing exchange also reaps indirect benefits. This interdependence gives rise to a free-rider problem. Given that providing incentives entails a cost, the strategic interaction between exchanges may lead to an equilibrium in which neither platform offers incentives – ultimately resulting in diminished overall competition. ...

September 12, 2025 · 2 min · Research Team

Optimal Quoting under Adverse Selection and Price Reading

Optimal Quoting under Adverse Selection and Price Reading ArXiv ID: 2508.20225 “View on arXiv” Authors: Alexander Barzykin, Philippe Bergault, Olivier Guéant, Malo Lemmel Abstract Over the past decade, many dealers have implemented algorithmic models to automatically respond to RFQs and manage flows originating from their electronic platforms. In parallel, building on the foundational work of Ho and Stoll, and later Avellaneda and Stoikov, the academic literature on market making has expanded to address trade size distributions, client tiering, complex price dynamics, alpha signals, and the internalization versus externalization dilemma in markets with dealer-to-client and interdealer-broker segments. In this paper, we tackle two critical dimensions: adverse selection, arising from the presence of informed traders, and price reading, whereby the market maker’s own quotes inadvertently reveal the direction of their inventory. These risks are well known to practitioners, who routinely face informed flows and algorithms capable of extracting signals from quoting behavior. Yet they have received limited attention in the quantitative finance literature, beyond stylized toy models with limited actionability. Extending the existing literature, we propose a tractable and implementable framework that enables market makers to adjust their quotes with greater awareness of informational risk. ...

August 27, 2025 · 2 min · Research Team

ARL-Based Multi-Action Market Making with Hawkes Processes and Variable Volatility

ARL-Based Multi-Action Market Making with Hawkes Processes and Variable Volatility ArXiv ID: 2508.16589 “View on arXiv” Authors: Ziyi Wang, Carmine Ventre, Maria Polukarov Abstract We advance market-making strategies by integrating Adversarial Reinforcement Learning (ARL), Hawkes Processes, and variable volatility levels while also expanding the action space available to market makers (MMs). To enhance the adaptability and robustness of these strategies – which can quote always, quote only on one side of the market or not quote at all – we shift from the commonly used Poisson process to the Hawkes process, which better captures real market dynamics and self-exciting behaviors. We then train and evaluate strategies under volatility levels of 2 and 200. Our findings show that the 4-action MM trained in a low-volatility environment effectively adapts to high-volatility conditions, maintaining stable performance and providing two-sided quotes at least 92% of the time. This indicates that incorporating flexible quoting mechanisms and realistic market simulations significantly enhances the effectiveness of market-making strategies. ...

August 7, 2025 · 2 min · Research Team