false

Is All the Information in the Price? LLM Embeddings versus the EMH in Stock Clustering

Is All the Information in the Price? LLM Embeddings versus the EMH in Stock Clustering ArXiv ID: 2509.01590 “View on arXiv” Authors: Bingyang Wang, Grant Johnson, Maria Hybinette, Tucker Balch Abstract This paper investigates whether artificial intelligence can enhance stock clustering compared to traditional methods. We consider this in the context of the semi-strong Efficient Markets Hypothesis (EMH), which posits that prices fully reflect all public information and, accordingly, that clusters based on price information cannot be improved upon. We benchmark three clustering approaches: (i) price-based clusters derived from historical return correlations, (ii) human-informed clusters defined by the Global Industry Classification Standard (GICS), and (iii) AI-driven clusters constructed from large language model (LLM) embeddings of stock-related news headlines. At each date, each method provides a classification in which each stock is assigned to a cluster. To evaluate a clustering, we transform it into a synthetic factor model following the Arbitrage Pricing Theory (APT) framework. This enables consistent evaluation of predictive performance in a roll forward, out-of-sample test. Using S&P 500 constituents from from 2022 through 2024, we find that price-based clustering consistently outperforms both rule-based and AI-based methods, reducing root mean squared error (RMSE) by 15.9% relative to GICS and 14.7% relative to LLM embeddings. Our contributions are threefold: (i) a generalizable methodology that converts any equity grouping: manual, machine, or market-driven, into a real-time factor model for evaluation; (ii) the first direct comparison of price-based, human rule-based, and AI-based clustering under identical conditions; and (iii) empirical evidence reinforcing that short-horizon return information is largely contained in prices. These results support the EMH while offering practitioners a practical diagnostic for monitoring evolving sector structures and provide academics a framework for testing alternative hypotheses about how quickly markets absorb information. ...

September 1, 2025 · 3 min · Research Team

Econometric Model Using Arbitrage Pricing Theory and Quantile Regression to Estimate the Risk Factors Driving Crude Oil Returns

Econometric Model Using Arbitrage Pricing Theory and Quantile Regression to Estimate the Risk Factors Driving Crude Oil Returns ArXiv ID: 2309.13096 “View on arXiv” Authors: Unknown Abstract This work adopts a novel approach to determine the risk and return of crude oil stocks by employing Arbitrage Pricing Theory (APT) and Quantile Regression (QR).The APT identifies the underlying risk factors likely to impact crude oil returns.Subsequently, QR estimates the relationship between the factors and the returns across different quantiles of the distribution. The West Texas Intermediate (WTI) crude oil price is used in this study as a benchmark for crude oil prices. WTI price fluctuations can have a significant impact on the performance of crude oil stocks and, subsequently, the global economy.To determine the proposed models stability, various statistical measures are used in this study.The results show that changes in WTI returns can have varying effects depending on market conditions and levels of volatility. The study highlights the impact of structural discontinuities on returns, which can be caused by changes in the global economy and the demand for crude oil.The inclusion of pandemic, geopolitical, and inflation-related explanatory variables add uniqueness to this study as it considers current global events that can affect crude oil returns.Findings show that the key factors that pose major risks to returns are industrial production, inflation, the global price of energy, the shape of the yield curve, and global economic policy uncertainty.This implies that while making investing decisions in WTI futures, investors should pay particular attention to these elements ...

September 22, 2023 · 2 min · Research Team

On the Time-Varying Structure of the Arbitrage Pricing Theory using the Japanese Sector Indices

On the Time-Varying Structure of the Arbitrage Pricing Theory using the Japanese Sector Indices ArXiv ID: 2305.05998 “View on arXiv” Authors: Unknown Abstract This paper is the first study to examine the time instability of the APT in the Japanese stock market. In particular, we measure how changes in each risk factor affect the stock risk premiums to investigate the validity of the APT over time, applying the rolling window method to Fama and MacBeth’s (1973) two-step regression and Kamstra and Shi’s (2023) generalized GRS test. We summarize our empirical results as follows: (1) the changes in monetary policy by major central banks greatly affect the validity of the APT in Japan, and (2) the time-varying estimates of the risk premiums for each factor are also unstable over time, and they are affected by the business cycle and economic crises. Therefore, we conclude that the validity of the APT as an appropriate model to explain the Japanese sector index is not stable over time. ...

May 10, 2023 · 2 min · Research Team