A Random Forest approach to detect and identify Unlawful Insider Trading

ArXiv ID: 2411.13564 “View on arXiv”

Authors: Unknown

Abstract

According to The Exchange Act, 1934 unlawful insider trading is the abuse of access to privileged corporate information. While a blurred line between “routine” the “opportunistic” insider trading exists, detection of strategies that insiders mold to maneuver fair market prices to their advantage is an uphill battle for hand-engineered approaches. In the context of detailed high-dimensional financial and trade data that are structurally built by multiple covariates, in this study, we explore, implement and provide detailed comparison to the existing study (Deng et al. (2019)) and independently implement automated end-to-end state-of-art methods by integrating principal component analysis to the random forest (PCA-RF) followed by a standalone random forest (RF) with 320 and 3984 randomly selected, semi-manually labeled and normalized transactions from multiple industry. The settings successfully uncover latent structures and detect unlawful insider trading. Among the multiple scenarios, our best-performing model accurately classified 96.43 percent of transactions. Among all transactions the models find 95.47 lawful as lawful and $98.00$ unlawful as unlawful percent. Besides, the model makes very few mistakes in classifying lawful as unlawful by missing only 2.00 percent. In addition to the classification task, model generated Gini Impurity based features ranking, our analysis show ownership and governance related features based on permutation values play important roles. In summary, a simple yet powerful automated end-to-end method relieves labor-intensive activities to redirect resources to enhance rule-making and tracking the uncaptured unlawful insider trading transactions. We emphasize that developed financial and trading features are capable of uncovering fraudulent behaviors.

Keywords: Insider Trading Detection, Principal Component Analysis (PCA), Random Forest (RF), Gini Impurity, Fraud Detection, Equities (Multi-Industry)

Complexity vs Empirical Score

  • Math Complexity: 4.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper employs established machine learning techniques (PCA, Random Forest) with relatively accessible mathematical foundations, resulting in a moderate math score. However, it demonstrates high empirical rigor through the use of a real-world financial dataset (3,304 transactions), explicit cross-validation, detailed performance metrics (96.43% accuracy, confusion matrix), and a focus on feature importance analysis, making it highly backtest-ready.
  flowchart TD
    A["Research Goal:<br>Detect unlawful insider trading"] --> B["Data Acquisition & Processing"]
    
    subgraph B ["Data/Inputs"]
        B1["320 & 3984 Transactions<br>Multi-industry data"]
        B2["Labeling & Normalization"]
    end
    
    B --> C["Key Methodology"]
    
    subgraph C ["Methodology"]
        C1["PCA-RF Model"]
        C2["Standalone RF Model"]
    end
    
    C --> D["Computational Process"]
    
    subgraph D ["Process"]
        D1["PCA Dimensionality Reduction"]
        D2["Random Forest Classification"]
        D3["Gini Impurity Feature Ranking"]
    end
    
    D --> E["Key Findings/Outcomes"]
    
    subgraph E ["Results"]
        E1["96.43% Classification Accuracy"]
        E2["95.47% Lawful, 98% Unlawful<br>Detection Rates"]
        E3["2% False Positive Rate<br>Ownership/Governance Features<br>Most Predictive"]
    end