Hierarchical Reinforcement Learning

Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution ArXiv ID: 2410.14927 “View on arXiv” Authors: Unknown Abstract Leveraging Deep Reinforcement Learning (DRL) in automated stock trading has shown promising results, yet its application faces significant challenges, including the curse of dimensionality, inertia in trading actions, and insufficient portfolio diversification. Addressing these challenges, we introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework. The HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value. In our empirical analysis, comparing the HRT agent with standalone DRL models and the S&P 500 benchmark during both bullish and bearish market conditions, we achieve a positive and higher Sharpe ratio. This advancement not only underscores the efficacy of incorporating hierarchical structures into DRL strategies but also mitigates the aforementioned challenges, paving the way for designing more profitable and robust trading algorithms in complex markets. ...

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading ArXiv ID: 2309.12891 “View on arXiv” Authors: Unknown Abstract High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 6 popular financial criteria, exceeding the runner-up by 30% in profitability. ...