Outlier Detection

A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books

A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books ArXiv ID: 2507.14960 “View on arXiv” Authors: Ivan Letteri Abstract The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management. ...

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Quantifying Outlierness of Funds from their Categories using Supervised Similarity ArXiv ID: 2308.06882 “View on arXiv” Authors: Unknown Abstract Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings. ...

Non-parametric cumulants approach for outlier detection of multivariate financial data

Non-parametric cumulants approach for outlier detection of multivariate financial data ArXiv ID: 2305.10911 “View on arXiv” Authors: Unknown Abstract In this paper, we propose an outlier detection algorithm for multivariate data based on their projections on the directions that maximize the Cumulant Generating Function (CGF). We prove that CGF is a convex function, and we characterize the CGF maximization problem on the unit n-circle as a concave minimization problem. Then, we show that the CGF maximization approach can be interpreted as an extension of the standard principal component technique. Therefore, for validation and testing, we provide a thorough comparison of our methodology with two other projection-based approaches both on artificial and real-world financial data. Finally, we apply our method as an early detector for financial crises. ...