Equities

End-to-End Policy Learning of a Statistical Arbitrage Autoencoder Architecture

End-to-End Policy Learning of a Statistical Arbitrage Autoencoder Architecture ArXiv ID: 2402.08233 “View on arXiv” Authors: Unknown Abstract In Statistical Arbitrage (StatArb), classical mean reversion trading strategies typically hinge on asset-pricing or PCA based models to identify the mean of a synthetic asset. Once such a (linear) model is identified, a separate mean reversion strategy is then devised to generate a trading signal. With a view of generalising such an approach and turning it truly data-driven, we study the utility of Autoencoder architectures in StatArb. As a first approach, we employ a standard Autoencoder trained on US stock returns to derive trading strategies based on the Ornstein-Uhlenbeck (OU) process. To further enhance this model, we take a policy-learning approach and embed the Autoencoder network into a neural network representation of a space of portfolio trading policies. This integration outputs portfolio allocations directly and is end-to-end trainable by backpropagation of the risk-adjusted returns of the neural policy. Our findings demonstrate that this innovative end-to-end policy learning approach not only simplifies the strategy development process, but also yields superior gross returns over its competitors illustrating the potential of end-to-end training over classical two-stage approaches. ...

Do Weibo platform experts perform better at predicting stock market?

Do Weibo platform experts perform better at predicting stock market? ArXiv ID: 2403.00772 “View on arXiv” Authors: Unknown Abstract Sentiment analysis can be used for stock market prediction. However, existing research has not studied the impact of a user’s financial background on sentiment-based forecasting of the stock market using artificial neural networks. In this work, a novel combination of neural networks is used for the assessment of sentiment-based stock market prediction, based on the financial background of the population that generated the sentiment. The state-of-the-art language processing model Bidirectional Encoder Representations from Transformers (BERT) is used to classify the sentiment and a Long-Short Term Memory (LSTM) model is used for time-series based stock market prediction. For evaluation, the Weibo social networking platform is used as a sentiment data collection source. Weibo users (and their comments respectively) are divided into Authorized Financial Advisor (AFA) and Unauthorized Financial Advisor (UFA) groups according to their background information, as collected by Weibo. The Hong Kong Hang Seng index is used to extract historical stock market change data. The results indicate that stock market prediction learned from the AFA group users is 39.67% more precise than that learned from the UFA group users and shows the highest accuracy (87%) when compared to existing approaches. ...

RiskMiner: Discovering Formulaic Alphas via Risk Seeking Monte Carlo Tree Search

RiskMiner: Discovering Formulaic Alphas via Risk Seeking Monte Carlo Tree Search ArXiv ID: 2402.07080 “View on arXiv” Authors: Unknown Abstract The formulaic alphas are mathematical formulas that transform raw stock data into indicated signals. In the industry, a collection of formulaic alphas is combined to enhance modeling accuracy. Existing alpha mining only employs the neural network agent, unable to utilize the structural information of the solution space. Moreover, they didn’t consider the correlation between alphas in the collection, which limits the synergistic performance. To address these problems, we propose a novel alpha mining framework, which formulates the alpha mining problems as a reward-dense Markov Decision Process (MDP) and solves the MDP by the risk-seeking Monte Carlo Tree Search (MCTS). The MCTS-based agent fully exploits the structural information of discrete solution space and the risk-seeking policy explicitly optimizes the best-case performance rather than average outcomes. Comprehensive experiments are conducted to demonstrate the efficiency of our framework. Our method outperforms all state-of-the-art benchmarks on two real-world stock sets under various metrics. Backtest experiments show that our alphas achieve the most profitable results under a realistic trading setting. ...

FNSPID: A Comprehensive Financial News Dataset in Time Series

FNSPID: A Comprehensive Financial News Dataset in Time Series ArXiv ID: 2402.06698 “View on arXiv” Authors: Unknown Abstract Financial market predictions utilize historical data to anticipate future stock prices and market trends. Traditionally, these predictions have focused on the statistical analysis of quantitative factors, such as stock prices, trading volumes, inflation rates, and changes in industrial production. Recent advancements in large language models motivate the integrated financial analysis of both sentiment data, particularly market news, and numerical factors. Nonetheless, this methodology frequently encounters constraints due to the paucity of extensive datasets that amalgamate both quantitative and qualitative sentiment analyses. To address this challenge, we introduce a large-scale financial dataset, namely, Financial News and Stock Price Integration Dataset (FNSPID). It comprises 29.7 million stock prices and 15.7 million time-aligned financial news records for 4,775 S&P500 companies, covering the period from 1999 to 2023, sourced from 4 stock market news websites. We demonstrate that FNSPID excels existing stock market datasets in scale and diversity while uniquely incorporating sentiment information. Through financial analysis experiments on FNSPID, we propose: (1) the dataset’s size and quality significantly boost market prediction accuracy; (2) adding sentiment scores modestly enhances performance on the transformer-based model; (3) a reproducible procedure that can update the dataset. Completed work, code, documentation, and examples are available at github.com/Zdong104/FNSPID. FNSPID offers unprecedented opportunities for the financial research community to advance predictive modeling and analysis. ...

Coarse graining correlation matrices according to macrostructures: Financial markets as a paradigm

Coarse graining correlation matrices according to macrostructures: Financial markets as a paradigm ArXiv ID: 2402.05364 “View on arXiv” Authors: Unknown Abstract We analyze correlation structures in financial markets by coarse graining the Pearson correlation matrices according to market sectors to obtain Guhr matrices using Guhr’s correlation method according to Ref. [“P. Rinn {"\it et. al.”}, Europhysics Letters 110, 68003 (2015)"]. We compare the results for the evolution of market states and the corresponding transition matrices with those obtained using Pearson correlation matrices. The behavior of market states is found to be similar for both the coarse grained and Pearson matrices. However, the number of relevant variables is reduced by orders of magnitude. ...

Cyber risk and the cross-section of stock returns

Cyber risk and the cross-section of stock returns ArXiv ID: 2402.04775 “View on arXiv” Authors: Unknown Abstract We extract firms’ cyber risk with a machine learning algorithm measuring the proximity between their disclosures and a dedicated cyber corpus. Our approach outperforms dictionary methods, uses full disclosure and not devoted-only sections, and generates a cyber risk measure uncorrelated with other firms’ characteristics. We find that a portfolio of US-listed stocks in the high cyber risk quantile generates an excess return of 18.72% p.a. Moreover, a long-short cyber risk portfolio has a significant and positive risk premium of 6.93% p.a., robust to all factors’ benchmarks. Finally, using a Bayesian asset pricing method, we show that our cyber risk factor is the essential feature that allows any multi-factor model to price the cross-section of stock returns. ...

Downside Risk Reduction Using Regime-Switching Signals: A Statistical Jump Model Approach

Downside Risk Reduction Using Regime-Switching Signals: A Statistical Jump Model Approach ArXiv ID: 2402.05272 “View on arXiv” Authors: Unknown Abstract This article investigates a regime-switching investment strategy aimed at mitigating downside risk by reducing market exposure during anticipated unfavorable market regimes. We highlight the statistical jump model (JM) for market regime identification, a recently developed robust model that distinguishes itself from traditional Markov-switching models by enhancing regime persistence through a jump penalty applied at each state transition. Our JM utilizes a feature set comprising risk and return measures derived solely from the return series, with the optimal jump penalty selected through a time-series cross-validation method that directly optimizes strategy performance. Our empirical analysis evaluates the realistic out-of-sample performance of various strategies on major equity indices from the US, Germany, and Japan from 1990 to 2023, in the presence of transaction costs and trading delays. The results demonstrate the consistent outperformance of the JM-guided strategy in reducing risk metrics such as volatility and maximum drawdown, and enhancing risk-adjusted returns like the Sharpe ratio, when compared to both hidden Markov model-guided strategy and the buy-and-hold strategy. These findings underline the enhanced persistence, practicality, and versatility of strategies utilizing JMs for regime-switching signals. ...

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations ArXiv ID: 2403.18831 “View on arXiv” Authors: Unknown Abstract In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX’s capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging “black-box” Deep Learning systems to create more efficient financial markets. ...

Sparse Portfolio Selection via Topological Data Analysis based Clustering

Sparse Portfolio Selection via Topological Data Analysis based Clustering ArXiv ID: 2401.16920 “View on arXiv” Authors: Unknown Abstract This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S&P index from 2009 to 2022, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios. ...

ESG driven pairs algorithm for sustainable trading: Analysis from the Indian market

ESG driven pairs algorithm for sustainable trading: Analysis from the Indian market ArXiv ID: 2401.14761 “View on arXiv” Authors: Unknown Abstract This paper proposes an algorithmic trading framework integrating Environmental, Social, and Governance (ESG) ratings with a pairs trading strategy. It addresses the demand for socially responsible investment solutions by developing a unique algorithm blending ESG data with methods for identifying co-integrated stocks. This allows selecting profitable pairs adhering to ESG principles. Further, it incorporates technical indicators for optimal trade execution within this sustainability framework. Extensive back-testing provides evidence of the model’s effectiveness, consistently generating positive returns exceeding conventional pairs trading strategies, while upholding ESG principles. This paves the way for a transformative approach to algorithmic trading, offering insights for investors, policymakers, and academics. ...