false

Can an unsupervised clustering algorithm reproduce a categorization system?

Can an unsupervised clustering algorithm reproduce a categorization system? ArXiv ID: 2408.10340 “View on arXiv” Authors: Unknown Abstract Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems’ consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depends on feature selection and the chosen distance metric. Using toy datasets and fund categorization as real-world examples we demonstrate that accurately reproducing ground truth classes is challenging. We also highlight the limitations of standard clustering evaluation metrics in identifying the optimal number of clusters relative to the ground truth classes. We then show that if appropriate features are available in the dataset, and a proper distance metric is known (e.g., using a supervised Random Forest-based distance metric learning method), then an unsupervised clustering can indeed reproduce the ground truth classes as distinct clusters. ...

August 19, 2024 · 2 min · Research Team

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Quantifying Outlierness of Funds from their Categories using Supervised Similarity ArXiv ID: 2308.06882 “View on arXiv” Authors: Unknown Abstract Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings. ...

August 14, 2023 · 2 min · Research Team