Reinforcement Learning

Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management

Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management ArXiv ID: 2309.16679 “View on arXiv” Authors: Unknown Abstract Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice. ...

An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution

An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution ArXiv ID: 2307.10649 “View on arXiv” Authors: Unknown Abstract The purpose of this research is to devise a tactic that can closely track the daily cumulative volume-weighted average price (VWAP) using reinforcement learning. Previous studies often choose a relatively short trading horizon to implement their models, making it difficult to accurately track the daily cumulative VWAP since the variations of financial data are often insignificant within the short trading horizon. In this paper, we aim to develop a strategy that can accurately track the daily cumulative VWAP while minimizing the deviation from the VWAP. We propose a method that leverages the U-shaped pattern of intraday stock trade volumes and use Proximal Policy Optimization (PPO) as the learning algorithm. Our method follows a dual-level approach: a Transformer model that captures the overall(global) distribution of daily volumes in a U-shape, and a LSTM model that handles the distribution of orders within smaller(local) time intervals. The results from our experiments suggest that this dual-level architecture improves the accuracy of approximating the cumulative VWAP, when compared to previous reinforcement learning-based models. ...

Reinforcement Learning for Credit Index Option Hedging

Reinforcement Learning for Credit Index Option Hedging ArXiv ID: 2307.09844 “View on arXiv” Authors: Unknown Abstract In this paper, we focus on finding the optimal hedging strategy of a credit index option using reinforcement learning. We take a practical approach, where the focus is on realism i.e. discrete time, transaction costs; even testing our policy on real market data. We apply a state of the art algorithm, the Trust Region Volatility Optimization (TRVO) algorithm and show that the derived hedging strategy outperforms the practitioner’s Black & Scholes delta hedge. ...

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation ArXiv ID: 2307.07694 “View on arXiv” Authors: Unknown Abstract We evaluate benchmark deep reinforcement learning algorithms on the task of portfolio optimisation using simulated data. The simulator to generate the data is based on correlated geometric Brownian motion with the Bertsimas-Lo market impact model. Using the Kelly criterion (log utility) as the objective, we can analytically derive the optimal policy without market impact as an upper bound to measure performance when including market impact. We find that the off-policy algorithms DDPG, TD3 and SAC are unable to learn the right $Q$-function due to the noisy rewards and therefore perform poorly. The on-policy algorithms PPO and A2C, with the use of generalised advantage estimation, are able to deal with the noise and derive a close to optimal policy. The clipping variant of PPO was found to be important in preventing the policy from deviating from the optimal once converged. In a more challenging environment where we have regime changes in the GBM parameters, we find that PPO, combined with a hidden Markov model to learn and predict the regime context, is able to learn different policies adapted to each regime. Overall, we find that the sample complexity of these algorithms is too high for applications using real data, requiring more than 2m steps to learn a good policy in the simplest setting, which is equivalent to almost 8,000 years of daily prices. ...

Action-State Dependent Dynamic Model Selection

Action-State Dependent Dynamic Model Selection ArXiv ID: 2307.04754 “View on arXiv” Authors: Unknown Abstract A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation procedure and a dynamic programming problem. A Reinforcement learning algorithm is used to approximate and estimate from the data the optimal solution to this dynamic programming problem. The algorithm is shown to consistently estimate the optimal policy that may choose different models based on a set of covariates. A typical example is the one of switching between different portfolio models under rebalancing costs, using macroeconomic information. Using a set of macroeconomic variables and price data, an empirical application to the aforementioned portfolio problem shows superior performance to choosing the best portfolio model with hindsight. ...

Option Market Making via Reinforcement Learning

Option Market Making via Reinforcement Learning ArXiv ID: 2307.01814 “View on arXiv” Authors: Unknown Abstract Market making of options with different maturities and strikes is a challenging problem due to its highly dimensional nature. In this paper, we propose a novel approach that combines a stochastic policy and reinforcement learning-inspired techniques to determine the optimal policy for posting bid-ask spreads for an options market maker who trades options with different maturities and strikes. ...

Over-the-Counter Market Making via Reinforcement Learning

Over-the-Counter Market Making via Reinforcement Learning ArXiv ID: 2307.01816 “View on arXiv” Authors: Unknown Abstract The over-the-counter (OTC) market is characterized by a unique feature that allows market makers to adjust bid-ask spreads based on order size. However, this flexibility introduces complexity, transforming the market-making problem into a high-dimensional stochastic control problem that presents significant challenges. To address this, this paper proposes an innovative solution utilizing reinforcement learning techniques to tackle the OTC market-making problem. By assuming a linear inverse relationship between market order arrival intensity and bid-ask spreads, we demonstrate the optimal policy for bid-ask spreads follows a Gaussian distribution. We apply two reinforcement learning algorithms to conduct a numerical analysis, revealing the resulting return distribution and bid-ask spreads under different time and inventory levels. ...

Continuous-time q-learning for mean-field control problems

Continuous-time q-learning for mean-field control problems ArXiv ID: 2306.16208 “View on arXiv” Authors: Unknown Abstract This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments. ...

Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio

Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio ArXiv ID: 2309.03202 “View on arXiv” Authors: Unknown Abstract This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed. ...

Optimal Execution Using Reinforcement Learning

Optimal Execution Using Reinforcement Learning ArXiv ID: 2306.17178 “View on arXiv” Authors: Unknown Abstract This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact of cross-exchange signals on the agent’s decision-making in the optimal execution problem. Experimental results show that cross-exchange signals can provide additional information for the optimal execution of cryptocurrency to facilitate the optimal execution process. ...