Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering
ArXiv ID: 2310.01285 “View on arXiv”
Authors: Unknown
Abstract
Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering’. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.
Keywords: Wasserstein Distance, Regime Detection, Time Series Clustering, Sliced Wasserstein Distance, Market Regimes, Foreign Exchange
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced mathematics including Wasserstein distances, barycenters, and sliced approximations, while also demonstrating the method on synthetic and real financial data with hyperparameter tuning and performance metrics.
flowchart TD
A["Research Goal:<br>Automated Regime Detection in<br>Multidimensional Time Series"] --> B
subgraph B ["Methodology Development"]
B1["Wk-means Clustering<br>on 1D Synthetic Data"] --> B2{"Performance Analysis"}
B2 --> B3["Identify Metrics for<br>High-Quality Clusterings"]
B3 --> B4["Extend to sWk-means<br>for Multidimensional Data"]
end
B4 --> C["Validation"]
subgraph C ["Data Application"]
C1["Synthetic Multidimensional Data"] --> C2["sWk-means Clustering"]
C3["Real FX Market Data"] --> C2
end
C2 --> D["Key Findings"]
subgraph D ["Outcomes"]
D1["Validated sWk-means on<br>Synthetic Data"]
D2["Identified Distinct<br>Market Regimes in FX Data"]
D3["Established Workflow for<br>Automated Regime Detection"]
end