Sparse Portfolio Selection via Topological Data Analysis based Clustering

ArXiv ID: 2401.16920 “View on arXiv”

Authors: Unknown

Abstract

This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S&P index from 2009 to 2022, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios.

Keywords: topological data analysis (TDA), persistence diagrams, sparse portfolio construction, index tracking, clustering algorithms, Equities

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced topological data analysis (TDA) concepts like persistence diagrams and landscapes, which are mathematically dense, and backs this with a specific empirical study on S&P data from 2009–2022 including COVID-19 validation, demonstrating data-heavy implementation.
  flowchart TD
    A["Research Goal"] --> B["Data: S&P Stocks 2009-2022"]
    B --> C["Topological Data Analysis<br>Persistence Diagrams & Landscapes"]
    C --> D["New Topological Distance Measures<br>incorporating time"]
    D --> E["Clustering & Asset Selection<br>Topologically Similar/Different Assets"]
    E --> F["Portfolio Construction<br>Sparse Index Tracking / Markowitz"]
    F --> G["Key Findings"]
    G --> G1["Enhanced Performance<br>across diverse markets"]
    G --> G2["Robustness<br>validated via COVID-19 study"]