An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI

ArXiv ID: 2508.02283 “View on arXiv”

Authors: Francis Boabang, Samuel Asante Gyamerah

Abstract

In insurance fraud prediction, handling class imbalance remains a critical challenge. This paper presents a novel multistage focal loss function designed to enhance the performance of machine learning models in such imbalanced settings by helping to escape local minima and converge to a good solution. Building upon the foundation of the standard focal loss, our proposed approach introduces a dynamic, multi-stage convex and nonconvex mechanism that progressively adjusts the focus on hard-to-classify samples across training epochs. This strategic refinement facilitates more stable learning and improved discrimination between fraudulent and legitimate cases. Through extensive experimentation on a real-world insurance dataset, our method achieved better performance than the traditional focal loss, as measured by accuracy, precision, F1-score, recall and Area Under the Curve (AUC) metrics on the auto insurance dataset. These results demonstrate the efficacy of the multistage focal loss in boosting model robustness and predictive accuracy in highly skewed classification tasks, offering significant implications for fraud detection systems in the insurance industry. An explainable model is included to interpret the results.

Keywords: Fraud Prediction, Class Imbalance, Focal Loss, Machine Learning, Risk Assessment

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Street Traders
  • Why: The paper focuses on a modified loss function (multistage focal loss) which involves moderate mathematical formulation but primarily adapts existing concepts, while the empirical section details extensive experimentation on a real-world insurance dataset with standard ML metrics (accuracy, F1, AUC) and mentions preprocessing like SMOTE/ADASYN.
  flowchart TD
    A["Research Goal:<br>Improve Fraud Detection<br>in Class Imbalanced Data"] --> B{"Data Input"}
    B --> C["Real-World Auto Insurance Dataset<br>Skewed: Legitimate vs Fraud"]
    C --> D["Key Methodology:<br>Multistage Focal Loss Function"]
    D --> E["Computational Process:<br>Training ML Models with<br>Dynamic Stage Adjustments"]
    E --> F["Performance Metrics:<br>F1, Precision, Recall, AUC"]
    F --> G["Key Outcome:<br>Outperforms Standard Focal Loss<br>Explainable AI Included"]