Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals
ArXiv ID: 2408.06679 “View on arXiv”
Authors: Unknown
Abstract
The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples from the data used to train or test the model. Despite its potential, XCBR has been relatively underexplored for many algorithms such as tree-based models until recently. We start by observing that most XCBR methods are defined based on the distance metric learned by the algorithm. By utilizing a recently proposed technique to extract the distance metric learned by Random Forests (RFs), which is both geometry- and accuracy-preserving, we investigate various XCBR methods. These methods amount to identify special points from the training datasets, such as prototypes, critics, counter-factuals, and semi-factuals, to explain the predictions for a given query of the RF. We evaluate these special points using various evaluation metrics to assess their explanatory power and effectiveness.
Keywords: Explainable AI (XAI), Random Forests, Case-Based Reasoning, Prototypes, Counterfactuals, General Financial Modeling
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 6.0/10
- Quadrant: Holy Grail
- Why: The paper introduces a novel, mathematically dense methodology based on geometric distance metrics (RF-GAP) and defines specialized loss functions for XCBR, while also testing the framework on both public and proprietary financial data with specific evaluation metrics.
flowchart TD
A["Research Goal:<br>Explain Random Forests using XCBR"] --> B["Core Methodology<br>Extract RF Distance Metric"]
B --> C["Identify Explanatory Points<br>Prototypes, Critics, Counter-factuals, Semi-factuals"]
C --> D["Evaluation & Outcomes<br>Assess Explanatory Power & Effectiveness"]