Needles in a haystack: using forensic network science to uncover insider trading

ArXiv ID: 2512.18918 “View on arXiv”

Authors: Gian Jaeger, Wang Ngai Yeung, Renaud Lambiotte

Abstract

Although the automation and digitisation of anti-financial crime investigation has made significant progress in recent years, detecting insider trading remains a unique challenge, partly due to the limited availability of labelled data. To address this challenge, we propose using a data-driven networks approach that flags groups of corporate insiders who report coordinated transactions that are indicative of insider trading. Specifically, we leverage data on 2.9 million trades reported to the U.S. Securities and Exchange Commission (SEC) by company insiders (C-suite executives, board members and major shareholders) between 2014 and 2024. Our proposed algorithm constructs weighted edges between insiders based on the temporal similarity of their trades over the 10-year timeframe. Within this network we then uncover trends that indicate insider trading by focusing on central nodes and anomalous subgraphs. To highlight the validity of our approach we evaluate our findings with reference to two null models, generated by running our algorithm on synthetic empirically calibrated and shuffled datasets. The results indicate that our approach can be used to detect pairs or clusters of insiders whose behaviour suggests insider trading and/or market manipulation.

Keywords: Network Analysis, Insider Trading Detection, Temporal Similarity, Graph Theory, SEC Filings, Equities

Complexity vs Empirical Score

  • Math Complexity: 6.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced network science concepts like weighted similarity kernels and statistical significance testing against null models, while demonstrating high empirical rigor with a 10-year dataset of 2.9 million SEC filings and comparative benchmarking against synthetic/shuffled datasets.
  flowchart TD
    A["Research Goal:<br>Detect Insider Trading<br>in Labelled-Data-Scarce Environments"] --> B["Input Data:<br>2.9M SEC Insider Trades<br>(2014-2024)"]
    B --> C["Core Methodology:<br>Construct Weighted Network<br>based on Temporal Trade Similarity"]
    C --> D["Analysis & Detection:<br>Identify Central Nodes<br>& Anomalous Subgraphs"]
    D --> E["Validation:<br>Compare against Null Models<br>(Synthetic & Shuffled Data)"]
    E --> F["Outcome:<br>Detection of Coordinated Insider Groups<br>Indicative of Trading/Manipulation"]