Mixture Models

Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction

Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction ArXiv ID: 2510.15691 “View on arXiv” Authors: Tian Guo, Emmanuel Hauptmann Abstract In quantitative investing, return prediction supports various tasks, including stock selection, portfolio optimization, and risk management. Quantitative factors, such as valuation, quality, and growth, capture various characteristics of stocks. Unstructured data, like news and transcripts, has attracted growing attention, driven by recent advances in large language models (LLMs). This paper examines effective methods for leveraging multimodal factors and newsflow in return prediction and stock selection. First, we introduce a fusion learning framework to learn a unified representation from factors and newsflow representations generated by an LLM. Within this framework, we compare three methods of different architectural complexities: representation combination, representation summation, and attentive representations. Next, building on the limitation of fusion learning observed in empirical comparison, we explore the mixture model that adaptively combines predictions made by single modalities and their fusion. To mitigate the training instability of the mixture model, we introduce a decoupled training approach with theoretical insights. Finally, our experiments on real investment universes yield several insights into effective multimodal modeling of factors and news for stock return prediction and selection. ...

Modelling financial returns with mixtures of generalized normal distributions

Modelling financial returns with mixtures of generalized normal distributions ArXiv ID: 2411.11847 “View on arXiv” Authors: Unknown Abstract This PhD Thesis presents an investigation into the analysis of financial returns using mixture models, focusing on mixtures of generalized normal distributions (MGND) and their extensions. The study addresses several critical issues encountered in the estimation process and proposes innovative solutions to enhance accuracy and efficiency. In Chapter 2, the focus lies on the MGND model and its estimation via expectation conditional maximization (ECM) and generalized expectation maximization (GEM) algorithms. A thorough exploration reveals a degeneracy issue when estimating the shape parameter. Several algorithms are proposed to overcome this critical issue. Chapter 3 extends the theoretical perspective by applying the MGND model on several stock market indices. A two-step approach is proposed for identifying turmoil days and estimating returns and volatility. Chapter 4 introduces constrained mixture of generalized normal distributions (CMGND), enhancing interpretability and efficiency by imposing constraints on parameters. Simulation results highlight the benefits of constrained parameter estimation. Finally, Chapter 5 introduces generalized normal distribution-hidden Markov models (GND-HMMs) able to capture the dynamic nature of financial returns. This manuscript contributes to the statistical modelling of financial returns by offering flexible, parsimonious, and interpretable frameworks. The proposed mixture models capture complex patterns in financial data, thereby facilitating more informed decision-making in financial analysis and risk management. ...

Estimation of tail risk measures in finance: Approaches to extreme value mixture modeling

Estimation of tail risk measures in finance: Approaches to extreme value mixture modeling ArXiv ID: 2407.05933 “View on arXiv” Authors: Unknown Abstract This thesis evaluates most of the extreme mixture models and methods that have appended in the literature and implements them in the context of finance and insurance. The paper also reviews and studies extreme value theory, time series, volatility clustering, and risk measurement methods in detail. Comparing the performance of extreme mixture models and methods on different simulated distributions shows that the method based on kernel density estimation does not have an absolute superior or close to the best performance, especially for the estimation of the extreme upper or lower tail of the distribution. Preprocessing time series data using a generalized autoregressive conditional heteroskedasticity model (GARCH) and applying extreme value mixture models on extracted residuals from GARCH can improve the goodness of fit and the estimation of the tail distribution. ...