An extreme Gradient Boosting (XGBoost) Trees approach to Detect and Identify Unlawful Insider Trading (UIT) Transactions
ArXiv ID: 2511.08306 “View on arXiv”
Authors: Krishna Neupane, Igor Griva
Abstract
Corporate insiders have control of material non-public preferential information (MNPI). Occasionally, the insiders strategically bypass legal and regulatory safeguards to exploit MNPI in their execution of securities trading. Due to a large volume of transactions a detection of unlawful insider trading becomes an arduous task for humans to examine and identify underlying patterns from the insider’s behavior. On the other hand, innovative machine learning architectures have shown promising results for analyzing large-scale and complex data with hidden patterns. One such popular technique is eXtreme Gradient Boosting (XGBoost), the state-of-the-arts supervised classifier. We, hence, resort to and apply XGBoost to alleviate challenges of identification and detection of unlawful activities. The results demonstrate that XGBoost can identify unlawful transactions with a high accuracy of 97 percent and can provide ranking of the features that play the most important role in detecting fraudulent activities.
Keywords: Fraud Detection, Insider Trading, XGBoost, Market Surveillance, Classification
Complexity vs Empirical Score
- Math Complexity: 2.5/10
- Empirical Rigor: 4.0/10
- Quadrant: Philosophers
- Why: The paper primarily applies a known machine learning classifier (XGBoost) without developing novel mathematics, resulting in low math complexity; while it reports high accuracy (97%) on a large dataset, the absence of full backtesting details, transaction costs, or live implementation details places it in a conceptual rather than ready-for-deployment space.
flowchart TD
A["Research Goal:<br>Detect Unlawful Insider Trading"] --> B["Input: Financial &<br>Insider Transaction Data"]
B --> C["Method: XGBoost<br>Supervised Learning Model"]
C --> D["Computational Process:<br>Training & Pattern Recognition"]
D --> E{"Model Validation<br>& Testing"}
E -- Iterative Tuning --> C
E --> F["Key Findings:<br>97% Accuracy in Detection"]
F --> G["Outcome:<br>Ranked Feature Importance<br>for Fraud Identification"]