Comparative Evaluation of Anomaly Detection Methods for Fraud Detection in Online Credit Card Payments

ArXiv ID: 2312.13896 “View on arXiv”

Authors: Unknown

Abstract

This study explores the application of anomaly detection (AD) methods in imbalanced learning tasks, focusing on fraud detection using real online credit card payment data. We assess the performance of several recent AD methods and compare their effectiveness against standard supervised learning methods. Offering evidence of distribution shift within our dataset, we analyze its impact on the tested models’ performances. Our findings reveal that LightGBM exhibits significantly superior performance across all evaluated metrics but suffers more from distribution shifts than AD methods. Furthermore, our investigation reveals that LightGBM also captures the majority of frauds detected by AD methods. This observation challenges the potential benefits of ensemble methods to combine supervised, and AD approaches to enhance performance. In summary, this research provides practical insights into the utility of these techniques in real-world scenarios, showing LightGBM’s superiority in fraud detection while highlighting challenges related to distribution shifts.

Keywords: Fraud detection, Anomaly detection, Imbalanced learning, LightGBM, Distribution shift, Credit / Consumer Finance

Complexity vs Empirical Score

  • Math Complexity: 5.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Holy Grail
  • Why: The paper presents a rigorous empirical evaluation on real-world credit card transaction data, analyzing distribution shifts and comparing multiple models with robust metrics. While it introduces advanced concepts in anomaly detection and imbalanced learning, its mathematics remains largely conceptual and algorithmic rather than dense theoretical derivations.
  flowchart TD
    A["Research Goal: Compare AD Methods vs Supervised Learning<br>for Fraud Detection"] --> B["Dataset: Real Online Credit Card Data<br>Imbalanced & Distribution Shift Exists"]
    B --> C["Methodology: Evaluate Anomaly Detection<br>vs Standard Supervised Learning"]
    C --> D{"Computational Process"}
    D --> E["LightGBM Supervised Model"]
    D --> F["Anomaly Detection Methods"]
    E & F --> G["Key Outcomes"]
    G --> H["LightGBM Superior Performance"]
    G --> I["AD Methods More Robust<br>to Distribution Shift"]
    G --> J["LightGBM Captures<br>Most AD-Detected Frauds"]