Equities

Microstructure-Empowered Stock Factor Extraction and Utilization

Microstructure-Empowered Stock Factor Extraction and Utilization ArXiv ID: 2308.08135 “View on arXiv” Authors: Unknown Abstract High-frequency quantitative investment is a crucial aspect of stock investment. Notably, order flow data plays a critical role as it provides the most detailed level of information among high-frequency trading data, including comprehensive data from the order book and transaction records at the tick level. The order flow data is extremely valuable for market analysis as it equips traders with essential insights for making informed decisions. However, extracting and effectively utilizing order flow data present challenges due to the large volume of data involved and the limitations of traditional factor mining techniques, which are primarily designed for coarser-level stock data. To address these challenges, we propose a novel framework that aims to effectively extract essential factors from order flow data for diverse downstream tasks across different granularities and scenarios. Our method consists of a Context Encoder and an Factor Extractor. The Context Encoder learns an embedding for the current order flow data segment’s context by considering both the expected and actual market state. In addition, the Factor Extractor uses unsupervised learning methods to select such important signals that are most distinct from the majority within the given context. The extracted factors are then utilized for downstream tasks. In empirical studies, our proposed framework efficiently handles an entire year of stock order flow data across diverse scenarios, offering a broader range of applications compared to existing tick-level approaches that are limited to only a few days of stock data. We demonstrate that our method extracts superior factors from order flow data, enabling significant improvement for stock trend prediction and order execution tasks at the second and minute level. ...

Company Similarity using Large Language Models

Company Similarity using Large Language Models ArXiv ID: 2308.08031 “View on arXiv” Authors: Unknown Abstract Identifying companies with similar profiles is a core task in finance with a wide range of applications in portfolio construction, asset pricing and risk attribution. When a rigorous definition of similarity is lacking, financial analysts usually resort to ’traditional’ industry classifications such as Global Industry Classification System (GICS) which assign a unique category to each company at different levels of granularity. Due to their discrete nature, though, GICS classifications do not allow for ranking companies in terms of similarity. In this paper, we explore the ability of pre-trained and finetuned large language models (LLMs) to learn company embeddings based on the business descriptions reported in SEC filings. We show that we can reproduce GICS classifications using the embeddings as features. We also benchmark these embeddings on various machine learning and financial metrics and conclude that the companies that are similar according to the embeddings are also similar in terms of financial performance metrics including return correlation. ...

Online Universal Dirichlet Factor Portfolios

Online Universal Dirichlet Factor Portfolios ArXiv ID: 2308.07763 “View on arXiv” Authors: Unknown Abstract We revisit the online portfolio allocation problem and propose universal portfolios that use factor weighing to produce portfolios that out-perform uniform dirichlet allocation schemes. We show a few analytical results on the lower bounds of portfolio growth when the returns are known to follow a factor model. We also show analytically that factor weighted dirichlet sampled portfolios dominate the wealth generated by uniformly sampled dirichlet portfolios. We corroborate our analytical results with empirical studies on equity markets that are known to be driven by factors. ...

Portfolio Selection via Topological Data Analysis

Portfolio Selection via Topological Data Analysis ArXiv ID: 2308.07944 “View on arXiv” Authors: Unknown Abstract Portfolio management is an essential part of investment decision-making. However, traditional methods often fail to deliver reasonable performance. This problem stems from the inability of these methods to account for the unique characteristics of multivariate time series data from stock markets. We present a two-stage method for constructing an investment portfolio of common stocks. The method involves the generation of time series representations followed by their subsequent clustering. Our approach utilizes features based on Topological Data Analysis (TDA) for the generation of representations, allowing us to elucidate the topological structure within the data. Experimental results show that our proposed system outperforms other methods. This superior performance is consistent over different time frames, suggesting the viability of TDA as a powerful tool for portfolio selection. ...

Correlation-diversified portfolio construction by finding maximum independent set in large-scale market graph

Correlation-diversified portfolio construction by finding maximum independent set in large-scale market graph ArXiv ID: 2308.04769 “View on arXiv” Authors: Unknown Abstract Correlation-diversified portfolios can be constructed by finding the maximum independent sets (MISs) in market graphs with edges corresponding to correlations between two stocks. The computational complexity to find the MIS increases exponentially as the size of the market graph increases, making the MIS selection in a large-scale market graph difficult. Here we construct a diversified portfolio by solving the MIS problem for a large-scale market graph with a combinatorial optimization solver (an Ising machine) based on a quantum-inspired algorithm called simulated bifurcation (SB) and investigate the investment performance of the constructed portfolio using long-term historical market data. Comparisons using stock universes of various sizes [“TOPIX 100, Nikkei 225, TOPIX 1000, and TOPIX (including approximately 2,000 constituents)”] show that the SB-based solver outperforms conventional MIS solvers in terms of computation-time and solution-accuracy. By using the SB-based solver, we optimized the parameters of a MIS portfolio strategy through iteration of the backcast simulation that calculates the performance of the MIS portfolio strategy based on a large-scale universe covering more than 1,700 Japanese stocks for a long period of 10 years. It has been found that the best MIS portfolio strategy (Sharpe ratio = 1.16, annualized return/risk = 16.3%/14.0%) outperforms the major indices such as TOPIX (0.66, 10.0%/15.2%) and MSCI Japan Minimum Volatility Index (0.64, 7.7%/12.1%) for the period from 2013 to 2023. ...

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey ArXiv ID: 2308.04947 “View on arXiv” Authors: Unknown Abstract Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain. ...

Reinforcement Learning for Financial Index Tracking

Reinforcement Learning for Financial Index Tracking ArXiv ID: 2308.02820 “View on arXiv” Authors: Unknown Abstract We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme. A comprehensive empirical study based on a 17-year-long testing set demonstrates that the proposed method outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy. ...

Recurrent Neural Networks with more flexible memory: better predictions than rough volatility

Recurrent Neural Networks with more flexible memory: better predictions than rough volatility ArXiv ID: 2308.08550 “View on arXiv” Authors: Unknown Abstract We extend recurrent neural networks to include several flexible timescales for each dimension of their output, which mechanically improves their abilities to account for processes with long memory or with highly disparate time scales. We compare the ability of vanilla and extended long short term memory networks (LSTMs) to predict asset price volatility, known to have a long memory. Generally, the number of epochs needed to train extended LSTMs is divided by two, while the variation of validation and test losses among models with the same hyperparameters is much smaller. We also show that the model with the smallest validation loss systemically outperforms rough volatility predictions by about 20% when trained and tested on a dataset with multiple time series. ...

Portfolio Optimization in a Market with Hidden Gaussian Drift and Randomly Arriving Expert Opinions: Modeling and Theoretical Results

Portfolio Optimization in a Market with Hidden Gaussian Drift and Randomly Arriving Expert Opinions: Modeling and Theoretical Results ArXiv ID: 2308.02049 “View on arXiv” Authors: Unknown Abstract This paper investigates the optimal selection of portfolios for power utility maximizing investors in a financial market where stock returns depend on a hidden Gaussian mean reverting drift process. Information on the drift is obtained from returns and expert opinions in the form of noisy signals about the current state of the drift arriving randomly over time. The arrival dates are modeled as the jump times of a homogeneous Poisson process. Applying Kalman filter techniques we derive estimates of the hidden drift which are described by the conditional mean and covariance of the drift given the observations. The utility maximization problem is solved with dynamic programming methods. We derive the associated dynamic programming equation and study regularization arguments for a rigorous mathematical justification. ...

Effects of Daily News Sentiment on Stock Price Forecasting

Effects of Daily News Sentiment on Stock Price Forecasting ArXiv ID: 2308.08549 “View on arXiv” Authors: Unknown Abstract Predicting future prices of a stock is an arduous task to perform. However, incorporating additional elements can significantly improve our predictions, rather than relying solely on a stock’s historical price data to forecast its future price. Studies have demonstrated that investor sentiment, which is impacted by daily news about the company, can have a significant impact on stock price swings. There are numerous sources from which we can get this information, but they are cluttered with a lot of noise, making it difficult to accurately extract the sentiments from them. Hence the focus of our research is to design an efficient system to capture the sentiments from the news about the NITY50 stocks and investigate how much the financial news sentiment of these stocks are affecting their prices over a period of time. This paper presents a robust data collection and preprocessing framework to create a news database for a timeline of around 3.7 years, consisting of almost half a million news articles. We also capture the stock price information for this timeline and create multiple time series data, that include the sentiment scores from various sections of the article, calculated using different sentiment libraries. Based on this, we fit several LSTM models to forecast the stock prices, with and without using the sentiment scores as features and compare their performances. ...