Equities

Risk Analysis of Passive Portfolios

Risk Analysis of Passive Portfolios ArXiv ID: 2407.08332 “View on arXiv” Authors: Unknown Abstract In this work, we present an alternative passive investment strategy. The passive investment philosophy comes from the Efficient Market Hypothesis (EMH), and its adoption is widespread. If EMH is true, one cannot outperform market by actively managing their portfolio for a long time. Also, it requires little to no intervention. People can buy an exchange-traded fund (ETF) with a long-term perspective. As the economy grows over time, one expects the ETF to grow. For example, in India, one can invest in NETF, which suppose to mimic the Nifty50 return. However, the weights of the Nifty 50 index are based on market capitalisation. These weights are not necessarily optimal for the investor. In this work, we present that volatility risk and extreme risk measures of the Nifty50 portfolio are uniformly larger than Markowitz’s optimal portfolio. However, common people can’t create an optimised portfolio. So we proposed an alternative passive investment strategy of an equal-weight portfolio. We show that if one pushes the maximum weight of the portfolio towards equal weight, the idiosyncratic risk of the portfolio would be minimal. The empirical evidence indicates that the risk profile of an equal-weight portfolio is similar to that of Markowitz’s optimal portfolio. Hence instead of buying Nifty50 ETFs, one should equally invest in the stocks of Nifty50 to achieve a uniformly better risk profile than the Nifty 50 ETF portfolio. We also present an analysis of how portfolios perform to idiosyncratic events like the Russian invasion of Ukraine. We found that the equal weight portfolio has a uniformly lower risk than the Nifty 50 portfolio before and during the Russia-Ukraine war. All codes are available on GitHub (\url{“https://github.com/sourish-cmi/quant/tree/main/Chap_Risk_Anal_of_Passive_Portfolio"}). ...

Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series

Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series ArXiv ID: 2407.03781 “View on arXiv” Authors: Unknown Abstract Estimation of high-dimensional covariance matrices in latent factor models is an important topic in many fields and especially in finance. Since the number of financial assets grows while the estimation window length remains of limited size, the often used sample estimator yields noisy estimates which are not even positive definite. Under the assumption of latent factor models, the covariance matrix is decomposed into a common low-rank component and a full-rank idiosyncratic component. In this paper we focus on the estimation of the idiosyncratic component, under the assumption of a grouped structure of the time series, which may arise due to specific factors such as industries, asset classes or countries. We propose a generalized methodology for estimation of the block-diagonal idiosyncratic component by clustering the residual series and applying shrinkage to the obtained blocks in order to ensure positive definiteness. We derive two different estimators based on different clustering methods and test their performance using simulation and historical data. The proposed methods are shown to provide reliable estimates and outperform other state-of-the-art estimators based on thresholding methods. ...

The Structure of Financial Equity Research Reports -- Identification of the Most Frequently Asked Questions in Financial Analyst Reports to Automate Equity Research Using Llama 3 and GPT-4

The Structure of Financial Equity Research Reports – Identification of the Most Frequently Asked Questions in Financial Analyst Reports to Automate Equity Research Using Llama 3 and GPT-4 ArXiv ID: 2407.18327 “View on arXiv” Authors: Unknown Abstract This research dissects financial equity research reports (ERRs) by mapping their content into categories. There is insufficient empirical analysis of the questions answered in ERRs. In particular, it is not understood how frequently certain information appears, what information is considered essential, and what information requires human judgment to distill into an ERR. The study analyzes 72 ERRs sentence-by-sentence, classifying their 4940 sentences into 169 unique question archetypes. We did not predefine the questions but derived them solely from the statements in the ERRs. This approach provides an unbiased view of the content of the observed ERRs. Subsequently, we used public corporate reports to classify the questions’ potential for automation. Answers were labeled “text-extractable” if the answers to the question were accessible in corporate reports. 78.7% of the questions in ERRs can be automated. Those automatable question consist of 48.2% text-extractable (suited to processing by large language models, LLMs) and 30.5% database-extractable questions. Only 21.3% of questions require human judgment to answer. We empirically validate using Llama-3-70B and GPT-4-turbo-2024-04-09 that recent advances in language generation and information extraction enable the automation of approximately 80% of the statements in ERRs. Surprisingly, the models complement each other’s strengths and weaknesses well. The research confirms that the current writing process of ERRs can likely benefit from additional automation, improving quality and efficiency. The research thus allows us to quantify the potential impacts of introducing large language models in the ERR writing process. The full question list, including the archetypes and their frequency, will be made available online after peer review. ...

When can weak latent factors be statistically inferred?

When can weak latent factors be statistically inferred? ArXiv ID: 2407.03616 “View on arXiv” Authors: Unknown Abstract This article establishes a new and comprehensive estimation and inference theory for principal component analysis (PCA) under the weak factor model that allow for cross-sectional dependent idiosyncratic components under the nearly minimal factor strength relative to the noise level or signal-to-noise ratio. Our theory is applicable regardless of the relative growth rate between the cross-sectional dimension $N$ and temporal dimension $T$. This more realistic assumption and noticeable result require completely new technical device, as the commonly-used leave-one-out trick is no longer applicable to the case with cross-sectional dependence. Another notable advancement of our theory is on PCA inference $ - $ for example, under the regime where $N\asymp T$, we show that the asymptotic normality for the PCA-based estimator holds as long as the signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$. This finding significantly surpasses prior work that required a polynomial rate of $N$. Our theory is entirely non-asymptotic, offering finite-sample characterizations for both the estimation error and the uncertainty level of statistical inference. A notable technical innovation is our closed-form first-order approximation of PCA-based estimator, which paves the way for various statistical tests. Furthermore, we apply our theories to design easy-to-implement statistics for validating whether given factors fall in the linear spans of unknown latent factors, testing structural breaks in the factor loadings for an individual unit, checking whether two units have the same risk exposures, and constructing confidence intervals for systematic risks. Our empirical studies uncover insightful correlations between our test results and economic cycles. ...

AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction

AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction ArXiv ID: 2407.18324 “View on arXiv” Authors: Unknown Abstract Stock volatility prediction is an important task in the financial industry. Recent advancements in multimodal methodologies, which integrate both textual and auditory data, have demonstrated significant improvements in this domain, such as earnings calls (Earnings calls are public available and often involve the management team of a public company and interested parties to discuss the company’s earnings). However, these multimodal methods have faced two drawbacks. First, they often fail to yield reliable models and overfit the data due to their absorption of stochastic information from the stock market. Moreover, using multimodal models to predict stock volatility suffers from gender bias and lacks an efficient way to eliminate such bias. To address these aforementioned problems, we use adversarial training to generate perturbations that simulate the inherent stochasticity and bias, by creating areas resistant to random information around the input space to improve model robustness and fairness. Our comprehensive experiments on two real-world financial audio datasets reveal that this method exceeds the performance of current state-of-the-art solution. This confirms the value of adversarial training in reducing stochasticity and bias for stock volatility prediction tasks. ...

CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications

CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications ArXiv ID: 2407.01953 “View on arXiv” Authors: Unknown Abstract The integration of Large Language Models (LLMs) into financial analysis has garnered significant attention in the NLP community. This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks: financial classification, financial text summarization, and single stock trading. We adopted Llama3-8B and Mistral-7B as base models, fine-tuning them through Parameter Efficient Fine-Tuning (PEFT) and Low-Rank Adaptation (LoRA) approaches. To enhance model performance, we combine datasets from task 1 and task 2 for data fusion. Our approach aims to tackle these diverse tasks in a comprehensive and integrated manner, showcasing LLMs’ capacity to address diverse and complex financial tasks with improved accuracy and decision-making capabilities. ...

Indian Stock Market Prediction using Augmented Financial Intelligence ML

Indian Stock Market Prediction using Augmented Financial Intelligence ML ArXiv ID: 2407.02236 “View on arXiv” Authors: Unknown Abstract This paper presents price prediction models using Machine Learning algorithms augmented with Superforecasters predictions, aimed at enhancing investment decisions. Five Machine Learning models are built, including Bidirectional LSTM, ARIMA, a combination of CNN and LSTM, GRU, and a model built using LSTM and GRU algorithms. The models are evaluated using the Mean Absolute Error to determine their predictive accuracy. Additionally, the paper suggests incorporating human intelligence by identifying Superforecasters and tracking their predictions to anticipate unpredictable shifts or changes in stock prices . The predictions made by these users can further enhance the accuracy of stock price predictions when combined with Machine Learning and Natural Language Processing techniques. Predicting the price of any commodity can be a significant task but predicting the price of a stock in the stock market deals with much more uncertainty. Recognising the limited knowledge and exposure to stocks among certain investors, this paper proposes price prediction models using Machine Learning algorithms. In this work, five Machine learning models are built using Bidirectional LSTM, ARIMA, a combination of CNN and LSTM, GRU and the last one is built using LSTM and GRU algorithms. Later these models are assessed using MAE scores to find which model is predicting with the highest accuracy. In addition to this, this paper also suggests the use of human intelligence to closely predict the shift in price patterns in the stock market The main goal is to identify Superforecasters and track their predictions to anticipate unpredictable shifts or changes in stock prices. By leveraging the combined power of Machine Learning and the Human Intelligence, predictive accuracy can be significantly increased. ...

Predicting public market behavior from private equity deals

Predicting public market behavior from private equity deals ArXiv ID: 2407.01818 “View on arXiv” Authors: Unknown Abstract We process private equity transactions to predict public market behavior with a logit model. Specifically, we estimate our model to predict quarterly returns for both the broad market and for individual sectors. Our hypothesis is that private equity investments (in aggregate) carry predictive signal about publicly traded securities. The key source of such predictive signal is the fact that, during their diligence process, private equity fund managers are privy to valuable company information that may not yet be reflected in the public markets at the time of their investment. Thus, we posit that we can discover investors’ collective near-term insight via detailed analysis of the timing and nature of the deals they execute. We evaluate the accuracy of the estimated model by applying it to test data where we know the correct output value. Remarkably, our model performs consistently better than a null model simply based on return statistics, while showing a predictive accuracy of up to 70% in sectors such as Consumer Services, Communications, and Non Energy Minerals. ...

What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs

What Teaches Robots to Walk, Teaches Them to Trade too – Regime Adaptive Execution using Informed Data and LLMs ArXiv ID: 2406.15508 “View on arXiv” Authors: Unknown Abstract Machine learning techniques applied to the problem of financial market forecasting struggle with dynamic regime switching, or underlying correlation and covariance shifts in true (hidden) market variables. Drawing inspiration from the success of reinforcement learning in robotics, particularly in agile locomotion adaptation of quadruped robots to unseen terrains, we introduce an innovative approach that leverages world knowledge of pretrained LLMs (aka. ‘privileged information’ in robotics) and dynamically adapts them using intrinsic, natural market rewards using LLM alignment technique we dub as “Reinforcement Learning from Market Feedback” (RLMF). Strong empirical results demonstrate the efficacy of our method in adapting to regime shifts in financial markets, a challenge that has long plagued predictive models in this domain. The proposed algorithmic framework outperforms best-performing SOTA LLM models on the existing (FLARE) benchmark stock-movement (SM) tasks by more than 15% improved accuracy. On the recently proposed NIFTY SM task, our adaptive policy outperforms the SOTA best performing trillion parameter models like GPT-4. The paper details the dual-phase, teacher-student architecture and implementation of our model, the empirical results obtained, and an analysis of the role of language embeddings in terms of Information Gain. ...

Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder

Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder ArXiv ID: 2406.19414 “View on arXiv” Authors: Unknown Abstract We demonstrate the use of Conditional Variational Encoder (CVAE) to improve the forecasts of daily stock volume time series in both short and long term forecasting tasks, with the use of advanced information of input variables such as rebalancing dates. CVAE generates non-linear time series as out-of-sample forecasts, which have better accuracy and closer fit of correlation to the actual data, compared to traditional linear models. These generative forecasts can also be used for scenario generation, which aids interpretation. We further discuss correlations in non-stationary time series and other potential extensions from the CVAE forecasts. ...