Classifying and Clustering Trading Agents

ArXiv ID: 2505.21662 “View on arXiv”

Authors: Mateusz Wilinski, Anubha Goel, Alexandros Iosifidis, Juho Kanniainen

Abstract

The rapid development of sophisticated machine learning methods, together with the increased availability of financial data, has the potential to transform financial research, but also poses a challenge in terms of validation and interpretation. A good case study is the task of classifying financial investors based on their behavioral patterns. Not only do we have access to both classification and clustering tools for high-dimensional data, but also data identifying individual investors is finally available. The problem, however, is that we do not have access to ground truth when working with real-world data. This, together with often limited interpretability of modern machine learning methods, makes it difficult to fully utilize the available research potential. In order to deal with this challenge we propose to use a realistic agent-based model as a way to generate synthetic data. This way one has access to ground truth, large replicable data, and limitless research scenarios. Using this approach we show how, even when classifying trading agents in a supervised manner is relatively easy, a more realistic task of unsupervised clustering may give incorrect or even misleading results. We complete the results with investigating the details of how supervised techniques were able to successfully distinguish between different trading behaviors.

Keywords: Agent-Based Modeling, Unsupervised Learning, Market Microstructure, Synthetic Data, Behavioral Finance, Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper uses advanced machine learning methods but focuses heavily on empirical validation via a synthetic agent-based limit order book environment, featuring reproducible code and detailed implementation of trading simulations.
  flowchart TD
    A["Research Goal<br>Classify & Cluster<br>Trading Agents"] --> B["Methodology<br>Agent-Based Model<br>for Synthetic Data"]
    B --> C["Data Generation<br>Ground Truth Labels<br>with Behavioral Patterns"]
    C --> D["Computational Process<br>Supervised Classification<br>e.g., Random Forest, SVM"]
    C --> E["Computational Process<br>Unsupervised Clustering<br>e.g., K-Means, GMM"]
    D --> F["Key Findings<br>Supervised: Highly Accurate"]
    E --> G["Key Findings<br>Unsupervised: Error/Misleading"]