false

Unveiling Hedge Funds: Topic Modeling and Sentiment Correlation with Fund Performance

Unveiling Hedge Funds: Topic Modeling and Sentiment Correlation with Fund Performance ArXiv ID: 2512.06620 “View on arXiv” Authors: Chang Liu Abstract The hedge fund industry presents significant challenges for investors due to its opacity and limited disclosure requirements. This pioneering study introduces two major innovations in financial text analysis. First, we apply topic modeling to hedge fund documents-an unexplored domain for automated text analysis-using a unique dataset of over 35,000 documents from 1,125 hedge fund managers. We compared three state-of-the-art methods: Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic. Our findings reveal that LDA with 20 topics produces the most interpretable results for human users and demonstrates higher robustness in topic assignments when the number of topics varies, while Top2Vec shows superior classification performance. Second, we establish a novel quantitative framework linking document sentiment to fund performance, transforming qualitative information traditionally requiring expert interpretation into systematic investment signals. In sentiment analysis, contrary to expectations, the general-purpose DistilBERT outperforms the finance-specific FinBERT in generating sentiment scores, demonstrating superior adaptability to diverse linguistic patterns found in hedge fund documents that extend beyond specialized financial news text. Furthermore, sentiment scores derived using DistilBERT in combination with Top2Vec show stronger correlations with subsequent fund performance compared to other model combinations. These results demonstrate that automated topic modeling and sentiment analysis can effectively process hedge fund documents, providing investors with new data-driven decision support tools. ...

December 7, 2025 · 2 min · Research Team

Understanding the Impact of News Articles on the Movement of Market Index: A Case on Nifty 50

Understanding the Impact of News Articles on the Movement of Market Index: A Case on Nifty 50 ArXiv ID: 2412.06794 “View on arXiv” Authors: Unknown Abstract In the recent past, there were several works on the prediction of stock price using different methods. Sentiment analysis of news and tweets and relating them to the movement of stock prices have already been explored. But, when we talk about the news, there can be several topics such as politics, markets, sports etc. It was observed that most of the prior analyses dealt with news or comments associated with particular stock prices only or the researchers dealt with overall sentiment scores only. However, it is quite possible that different topics having different levels of impact on the movement of the stock price or an index. The current study focused on bridging this gap by analysing the movement of Nifty 50 index with respect to the sentiments associated with news items related to various different topic such as sports, politics, markets etc. The study established that sentiment scores of news items of different other topics also have a significant impact on the movement of the index. ...

November 22, 2024 · 2 min · Research Team

Enhancing literature review with LLM and NLP methods. Algorithmic trading case

Enhancing literature review with LLM and NLP methods. Algorithmic trading case ArXiv ID: 2411.05013 “View on arXiv” Authors: Unknown Abstract This study utilizes machine learning algorithms to analyze and organize knowledge in the field of algorithmic trading. By filtering a dataset of 136 million research papers, we identified 14,342 relevant articles published between 1956 and Q1 2020. We compare traditional practices-such as keyword-based algorithms and embedding techniques-with state-of-the-art topic modeling methods that employ dimensionality reduction and clustering. This comparison allows us to assess the popularity and evolution of different approaches and themes within algorithmic trading. We demonstrate the usefulness of Natural Language Processing (NLP) in the automatic extraction of knowledge, highlighting the new possibilities created by the latest iterations of Large Language Models (LLMs) like ChatGPT. The rationale for focusing on this topic stems from our analysis, which reveals that research articles on algorithmic trading are increasing at a faster rate than the overall number of publications. While stocks and main indices comprise more than half of all assets considered, certain asset classes, such as cryptocurrencies, exhibit a much stronger growth trend. Machine learning models have become the most popular methods in recent years. The study demonstrates the efficacy of LLMs in refining datasets and addressing intricate questions about the analyzed articles, such as comparing the efficiency of different models. Our research shows that by decomposing tasks into smaller components and incorporating reasoning steps, we can effectively tackle complex questions supported by case analyses. This approach contributes to a deeper understanding of algorithmic trading methodologies and underscores the potential of advanced NLP techniques in literature reviews. ...

October 23, 2024 · 2 min · Research Team

Multi-Industry Simplex 2.0 : Temporally-Evolving Probabilistic Industry Classification

Multi-Industry Simplex 2.0 : Temporally-Evolving Probabilistic Industry Classification ArXiv ID: 2407.16437 “View on arXiv” Authors: Unknown Abstract Accurate industry classification is critical for many areas of portfolio management, yet the traditional single-industry framework of the Global Industry Classification Standard (GICS) struggles to comprehensively represent risk for highly diversified multi-sector conglomerates like Amazon. Previously, we introduced the Multi-Industry Simplex (MIS), a probabilistic extension of GICS that utilizes topic modeling, a natural language processing approach. Although our initial version, MIS-1, was able to improve upon GICS by providing multi-industry representations, it relied on an overly simple architecture that required prior knowledge about the number of industries and relied on the unrealistic assumption that industries are uncorrelated and independent over time. We improve upon this model with MIS-2, which addresses three key limitations of MIS-1 : we utilize Bayesian Non-Parametrics to automatically infer the number of industries from data, we employ Markov Updating to account for industries that change over time, and we adjust for correlated and hierarchical industries allowing for both broad and niche industries (similar to GICS). Further, we provide an out-of-sample test directly comparing MIS-2 and GICS on the basis of future correlation prediction, where we find evidence that MIS-2 provides a measurable improvement over GICS. MIS-2 provides portfolio managers with a more robust tool for industry classification, empowering them to more effectively identify and manage risk, particularly around multi-sector conglomerates in a rapidly evolving market in which new industries periodically emerge. ...

July 23, 2024 · 2 min · Research Team

Multi-Label Topic Model for Financial Textual Data

Multi-Label Topic Model for Financial Textual Data ArXiv ID: 2311.07598 “View on arXiv” Authors: Unknown Abstract This paper presents a multi-label topic model for financial texts like ad-hoc announcements, 8-K filings, finance related news or annual reports. I train the model on a new financial multi-label database consisting of 3,044 German ad-hoc announcements that are labeled manually using 20 predefined, economically motivated topics. The best model achieves a macro F1 score of more than 85%. Translating the data results in an English version of the model with similar performance. As application of the model, I investigate differences in stock market reactions across topics. I find evidence for strong positive or negative market reactions for some topics, like announcements of new Large Scale Projects or Bankruptcy Filings, while I do not observe significant price effects for some other topics. Furthermore, in contrast to previous studies, the multi-label structure of the model allows to analyze the effects of co-occurring topics on stock market reactions. For many cases, the reaction to a specific topic depends heavily on the co-occurrence with other topics. For example, if allocated capital from a Seasoned Equity Offering (SEO) is used for restructuring a company in the course of a Bankruptcy Proceeding, the market reacts positively on average. However, if that capital is used for covering unexpected, additional costs from the development of new drugs, the SEO implies negative reactions on average. ...

November 10, 2023 · 2 min · Research Team

Multi-Industry Simplex : A Probabilistic Extension of GICS

Multi-Industry Simplex : A Probabilistic Extension of GICS ArXiv ID: 2310.04280 “View on arXiv” Authors: Unknown Abstract Accurate industry classification is a critical tool for many asset management applications. While the current industry gold-standard GICS (Global Industry Classification Standard) has proven to be reliable and robust in many settings, it has limitations that cannot be ignored. Fundamentally, GICS is a single-industry model, in which every firm is assigned to exactly one group - regardless of how diversified that firm may be. This approach breaks down for large conglomerates like Amazon, which have risk exposure spread out across multiple sectors. We attempt to overcome these limitations by developing MIS (Multi-Industry Simplex), a probabilistic model that can flexibly assign a firm to as many industries as can be supported by the data. In particular, we utilize topic modeling, an natural language processing approach that utilizes business descriptions to extract and identify corresponding industries. Each identified industry comes with a relevance probability, allowing for high interpretability and easy auditing, circumventing the black-box nature of alternative machine learning approaches. We describe this model in detail and provide two use-cases that are relevant to asset management - thematic portfolios and nearest neighbor identification. While our approach has limitations of its own, we demonstrate the viability of probabilistic industry classification and hope to inspire future research in this field. ...

October 6, 2023 · 2 min · Research Team