false

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations ArXiv ID: 2403.18831 “View on arXiv” Authors: Unknown Abstract In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX’s capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging “black-box” Deep Learning systems to create more efficient financial markets. ...

February 6, 2024 · 2 min · Research Team

Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning

Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning ArXiv ID: 2312.15385 “View on arXiv” Authors: Unknown Abstract This paper studies a discrete-time mean-variance model based on reinforcement learning. Compared with its continuous-time counterpart in \cite{“zhou2020mv”}, the discrete-time model makes more general assumptions about the asset’s return distribution. Using entropy to measure the cost of exploration, we derive the optimal investment strategy, whose density function is also Gaussian type. Additionally, we design the corresponding reinforcement learning algorithm. Both simulation experiments and empirical analysis indicate that our discrete-time model exhibits better applicability when analyzing real-world data than the continuous-time model. ...

December 24, 2023 · 2 min · Research Team

CVA Hedging by Risk-Averse Stochastic-Horizon Reinforcement Learning

CVA Hedging by Risk-Averse Stochastic-Horizon Reinforcement Learning ArXiv ID: 2312.14044 “View on arXiv” Authors: Unknown Abstract This work studies the dynamic risk management of the risk-neutral value of the potential credit losses on a portfolio of derivatives. Sensitivities-based hedging of such liability is sub-optimal because of bid-ask costs, pricing models which cannot be completely realistic, and a discontinuity at default time. We leverage recent advances on risk-averse Reinforcement Learning developed specifically for option hedging with an ad hoc practice-aligned objective function aware of pathwise volatility, generalizing them to stochastic horizons. We formalize accurately the evolution of the hedger’s portfolio stressing such aspects. We showcase the efficacy of our approach by a numerical study for a portfolio composed of a single FX forward contract. ...

December 21, 2023 · 2 min · Research Team

Market-Adaptive Ratio for Portfolio Management

Market-Adaptive Ratio for Portfolio Management ArXiv ID: 2312.13719 “View on arXiv” Authors: Unknown Abstract Traditional risk-adjusted returns, such as the Treynor, Sharpe, Sortino, and Information ratios, have been pivotal in portfolio asset allocation, focusing on minimizing risk while maximizing profit. Nevertheless, these metrics often fail to account for the distinct characteristics of bull and bear markets, leading to sub-optimal investment decisions. This paper introduces a novel approach called the Market-adaptive Ratio, which was designed to adjust risk preferences dynamically in response to market conditions. By integrating the $ρ$ parameter, which differentiates between bull and bear markets, this new ratio enables a more adaptive portfolio management strategy. The $ρ$ parameter is derived from historical data and implemented within a reinforcement learning framework, allowing the method to learn and optimize portfolio allocations based on prevailing market trends. Empirical analysis showed that the Market-adaptive Ratio outperformed the Sharpe Ratio by providing more robust risk-adjusted returns tailored to the specific market environment. This advance enhances portfolio performance by aligning investment strategies with the inherent dynamics of bull and bear markets, optimizing risk and return outcomes. ...

December 21, 2023 · 2 min · Research Team

Data-Driven Merton's Strategies via Policy Randomization

Data-Driven Merton’s Strategies via Policy Randomization ArXiv ID: 2312.11797 “View on arXiv” Authors: Unknown Abstract We study Merton’s expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. The agent under consideration is a price taker who has access only to the stock and factor value processes and the instantaneous volatility. We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of Gaussian distributions, and prove that the mean of its optimal Gaussian policy solves the original Merton problem. With randomized policies, we are in the realm of continuous-time reinforcement learning (RL) recently developed in Wang et al. (2020) and Jia and Zhou (2022a, 2022b, 2023), enabling us to solve the auxiliary problem in a data-driven way without having to estimate the model primitives. Specifically, we establish a policy improvement theorem based on which we design both online and offline actor-critic RL algorithms for learning Merton’s strategies. A key insight from this study is that RL in general and policy randomization in particular are useful beyond the purpose for exploration – they can be employed as a technical tool to solve a problem that cannot be otherwise solved by mere deterministic policies. At last, we carry out both simulation and empirical studies in a stochastic volatility environment to demonstrate the decisive outperformance of the devised RL algorithms in comparison to the conventional model-based, plug-in method. ...

December 19, 2023 · 2 min · Research Team

Onflow: an online portfolio allocation algorithm

Onflow: an online portfolio allocation algorithm ArXiv ID: 2312.05169 “View on arXiv” Authors: Unknown Abstract We introduce Onflow, a reinforcement learning technique that enables online optimization of portfolio allocation policies based on gradient flows. We devise dynamic allocations of an investment portfolio to maximize its expected log return while taking into account transaction fees. The portfolio allocation is parameterized through a softmax function, and at each time step, the gradient flow method leads to an ordinary differential equation whose solutions correspond to the updated allocations. This algorithm belongs to the large class of stochastic optimization procedures; we measure its efficiency by comparing our results to the mathematical theoretical values in a log-normal framework and to standard benchmarks from the ‘old NYSE’ dataset. For log-normal assets, the strategy learned by Onflow, with transaction costs at zero, mimics Markowitz’s optimal portfolio and thus the best possible asset allocation strategy. Numerical experiments from the ‘old NYSE’ dataset show that Onflow leads to dynamic asset allocation strategies whose performances are: a) comparable to benchmark strategies such as Cover’s Universal Portfolio or Helmbold et al. “multiplicative updates” approach when transaction costs are zero, and b) better than previous procedures when transaction costs are high. Onflow can even remain efficient in regimes where other dynamical allocation techniques do not work anymore. Therefore, as far as tested, Onflow appears to be a promising dynamic portfolio management strategy based on observed prices only and without any assumption on the laws of distributions of the underlying assets’ returns. In particular it could avoid model risk when building a trading strategy. ...

December 8, 2023 · 2 min · Research Team

Adaptive Agents and Data Quality in Agent-Based Financial Markets

Adaptive Agents and Data Quality in Agent-Based Financial Markets ArXiv ID: 2311.15974 “View on arXiv” Authors: Unknown Abstract We present our Agent-Based Market Microstructure Simulation (ABMMS), an Agent-Based Financial Market (ABFM) that captures much of the complexity present in the US National Market System for equities (NMS). Agent-Based models are a natural choice for understanding financial markets. Financial markets feature a constrained action space that should simplify model creation, produce a wealth of data that should aid model validation, and a successful ABFM could strongly impact system design and policy development processes. Despite these advantages, ABFMs have largely remained an academic novelty. We hypothesize that two factors limit the usefulness of ABFMs. First, many ABFMs fail to capture relevant microstructure mechanisms, leading to differences in the mechanics of trading. Second, the simple agents that commonly populate ABFMs do not display the breadth of behaviors observed in human traders or the trading systems that they create. We investigate these issues through the development of ABMMS, which features a fragmented market structure, communication infrastructure with propagation delays, realistic auction mechanisms, and more. As a baseline, we populate ABMMS with simple trading agents and investigate properties of the generated data. We then compare the baseline with experimental conditions that explore the impacts of market topology or meta-reinforcement learning agents. The combination of detailed market mechanisms and adaptive agents leads to models whose generated data more accurately reproduce stylized facts observed in actual markets. These improvements increase the utility of ABFMs as tools to inform design and policy decisions. ...

November 27, 2023 · 2 min · Research Team

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series ArXiv ID: 2311.13326 “View on arXiv” Authors: Unknown Abstract Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline – giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution. ...

November 22, 2023 · 2 min · Research Team

Reinforcement Learning and Deep Stochastic Optimal Control for Final Quadratic Hedging

Reinforcement Learning and Deep Stochastic Optimal Control for Final Quadratic Hedging ArXiv ID: 2401.08600 “View on arXiv” Authors: Unknown Abstract We consider two data driven approaches, Reinforcement Learning (RL) and Deep Trajectory-based Stochastic Optimal Control (DTSOC) for hedging a European call option without and with transaction cost according to a quadratic hedging P&L objective at maturity (“variance-optimal hedging” or “final quadratic hedging”). We study the performance of the two approaches under various market environments (modeled via the Black-Scholes and/or the log-normal SABR model) to understand their advantages and limitations. Without transaction costs and in the Black-Scholes model, both approaches match the performance of the variance-optimal Delta hedge. In the log-normal SABR model without transaction costs, they match the performance of the variance-optimal Barlett’s Delta hedge. Agents trained on Black-Scholes trajectories with matching initial volatility but used on SABR trajectories match the performance of Bartlett’s Delta hedge in average cost, but show substantially wider variance. To apply RL approaches to these problems, P&L at maturity is written as sum of step-wise contributions and variants of RL algorithms are implemented and used that minimize expectation of second moments of such sums. ...

November 20, 2023 · 2 min · Research Team

Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools

Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools ArXiv ID: 2311.10801 “View on arXiv” Authors: Unknown Abstract Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors’ practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit. ...

November 17, 2023 · 2 min · Research Team