false

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning ArXiv ID: 2405.13609 “View on arXiv” Authors: Unknown Abstract Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision process. However, a large class of problems does not fit straightforwardly into this framework: Non-cumulative Markov decision processes (NCMDPs), where instead of the expected sum of rewards, the expected value of an arbitrary function of the rewards is maximized. Example functions include the maximum of the rewards or their mean divided by their standard deviation. In this work, we introduce a general mapping of NCMDPs to standard MDPs. This allows all techniques developed to find optimal policies for MDPs, such as reinforcement learning or dynamic programming, to be directly applied to the larger class of NCMDPs. Focusing on reinforcement learning, we show applications in a diverse set of tasks, including classical control, portfolio optimization in finance, and discrete optimization problems. Given our approach, we can improve both final performance and training time compared to relying on standard MDPs. ...

May 22, 2024 · 2 min · Research Team

NIFTY Financial News Headlines Dataset

NIFTY Financial News Headlines Dataset ArXiv ID: 2405.09747 “View on arXiv” Authors: Unknown Abstract We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt’s context length systematically) are available on Hugging Face at https://huggingface.co/datasets/raeidsaqur/NIFTY. ...

May 16, 2024 · 2 min · Research Team

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management ArXiv ID: 2405.05449 “View on arXiv” Authors: Unknown Abstract Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz’s portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model’s superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios. ...

May 8, 2024 · 2 min · Research Team

$ε$-Policy Gradient for Online Pricing

$ε$-Policy Gradient for Online Pricing ArXiv ID: 2405.03624 “View on arXiv” Authors: Unknown Abstract Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $ε$-policy gradient algorithm for the online pricing learning task. The algorithm extends $ε$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the exploration cost in terms of the exploration probability $ε$ and the exploitation cost in terms of the gradient descent optimization and gradient estimation errors. The algorithm achieves an expected regret of order $\mathcal{“O”}(\sqrt{“T”})$ (up to a logarithmic factor) over $T$ trials. ...

May 6, 2024 · 2 min · Research Team

Portfolio Management using Deep Reinforcement Learning

Portfolio Management using Deep Reinforcement Learning ArXiv ID: 2405.01604 “View on arXiv” Authors: Unknown Abstract Algorithmic trading or Financial robots have been conquering the stock markets with their ability to fathom complex statistical trading strategies. But with the recent development of deep learning technologies, these strategies are becoming impotent. The DQN and A2C models have previously outperformed eminent humans in game-playing and robotics. In our work, we propose a reinforced portfolio manager offering assistance in the allocation of weights to assets. The environment proffers the manager the freedom to go long and even short on the assets. The weight allocation advisements are restricted to the choice of portfolio assets and tested empirically to knock benchmark indices. The manager performs financial transactions in a postulated liquid market without any transaction charges. This work provides the conclusion that the proposed portfolio manager with actions centered on weight allocations can surpass the risk-adjusted returns of conventional portfolio managers. ...

May 1, 2024 · 2 min · Research Team

Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior

Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior ArXiv ID: 2403.19781 “View on arXiv” Authors: Unknown Abstract Investors and regulators can greatly benefit from a realistic market simulator that enables them to anticipate the consequences of their decisions in real markets. However, traditional rule-based market simulators often fall short in accurately capturing the dynamic behavior of market participants, particularly in response to external market impact events or changes in the behavior of other participants. In this study, we explore an agent-based simulation framework employing reinforcement learning (RL) agents. We present the implementation details of these RL agents and demonstrate that the simulated market exhibits realistic stylized facts observed in real-world markets. Furthermore, we investigate the behavior of RL agents when confronted with external market impacts, such as a flash crash. Our findings shed light on the effectiveness and adaptability of RL-based agents within the simulation, offering insights into their response to significant market events. ...

March 28, 2024 · 2 min · Research Team

Advanced Statistical Arbitrage with Reinforcement Learning

Advanced Statistical Arbitrage with Reinforcement Learning ArXiv ID: 2403.12180 “View on arXiv” Authors: Unknown Abstract Statistical arbitrage is a prevalent trading strategy which takes advantage of mean reverse property of spread of paired stocks. Studies on this strategy often rely heavily on model assumption. In this study, we introduce an innovative model-free and reinforcement learning based framework for statistical arbitrage. For the construction of mean reversion spreads, we establish an empirical reversion time metric and optimize asset coefficients by minimizing this empirical mean reversion time. In the trading phase, we employ a reinforcement learning framework to identify the optimal mean reversion strategy. Diverging from traditional mean reversion strategies that primarily focus on price deviations from a long-term mean, our methodology creatively constructs the state space to encapsulate the recent trends in price movements. Additionally, the reward function is carefully tailored to reflect the unique characteristics of mean reversion trading. ...

March 18, 2024 · 2 min · Research Team

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist ArXiv ID: 2402.18485 “View on arXiv” Authors: Unknown Abstract Financial trading is a crucial component of the markets, informed by a multimodal information landscape encompassing news, prices, and Kline charts, and encompasses diverse tasks such as quantitative trading and high-frequency trading with various assets. While advanced AI techniques like deep learning and reinforcement learning are extensively utilized in finance, their application in financial trading tasks often faces challenges due to inadequate handling of multimodal data and limited generalizability across various tasks. To address these challenges, we present FinAgent, a multimodal foundational agent with tool augmentation for financial trading. FinAgent’s market intelligence module processes a diverse range of data-numerical, textual, and visual-to accurately analyze the financial market. Its unique dual-level reflection module not only enables rapid adaptation to market dynamics but also incorporates a diversified memory retrieval system, enhancing the agent’s ability to learn from historical data and improve decision-making processes. The agent’s emphasis on reasoning for actions fosters trust in its financial decisions. Moreover, FinAgent integrates established trading strategies and expert insights, ensuring that its trading approaches are both data-driven and rooted in sound financial principles. With comprehensive experiments on 6 financial datasets, including stocks and Crypto, FinAgent significantly outperforms 9 state-of-the-art baselines in terms of 6 financial metrics with over 36% average improvement on profit. Specifically, a 92.27% return (a 84.39% relative improvement) is achieved on one dataset. Notably, FinAgent is the first advanced multimodal foundation agent designed for financial trading tasks. ...

February 28, 2024 · 2 min · Research Team

Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying

Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying ArXiv ID: 2402.12049 “View on arXiv” Authors: Unknown Abstract Optimal execution is an important problem faced by any trader. Most solutions are based on the assumption of constant market impact, while liquidity is known to be dynamic. Moreover, models with time-varying liquidity typically assume that it is observable, despite the fact that, in reality, it is latent and hard to measure in real time. In this paper we show that the use of Double Deep Q-learning, a form of Reinforcement Learning based on neural networks, is able to learn optimal trading policies when liquidity is time-varying. Specifically, we consider an Almgren-Chriss framework with temporary and permanent impact parameters following several deterministic and stochastic dynamics. Using extensive numerical experiments, we show that the trained algorithm learns the optimal policy when the analytical solution is available, and overcomes benchmarks and approximated solutions when the solution is not available. ...

February 19, 2024 · 2 min · Research Team

RiskMiner: Discovering Formulaic Alphas via Risk Seeking Monte Carlo Tree Search

RiskMiner: Discovering Formulaic Alphas via Risk Seeking Monte Carlo Tree Search ArXiv ID: 2402.07080 “View on arXiv” Authors: Unknown Abstract The formulaic alphas are mathematical formulas that transform raw stock data into indicated signals. In the industry, a collection of formulaic alphas is combined to enhance modeling accuracy. Existing alpha mining only employs the neural network agent, unable to utilize the structural information of the solution space. Moreover, they didn’t consider the correlation between alphas in the collection, which limits the synergistic performance. To address these problems, we propose a novel alpha mining framework, which formulates the alpha mining problems as a reward-dense Markov Decision Process (MDP) and solves the MDP by the risk-seeking Monte Carlo Tree Search (MCTS). The MCTS-based agent fully exploits the structural information of discrete solution space and the risk-seeking policy explicitly optimizes the best-case performance rather than average outcomes. Comprehensive experiments are conducted to demonstrate the efficiency of our framework. Our method outperforms all state-of-the-art benchmarks on two real-world stock sets under various metrics. Backtest experiments show that our alphas achieve the most profitable results under a realistic trading setting. ...

February 11, 2024 · 2 min · Research Team