Detecting Fraud in Financial Networks: A Semi-Supervised GNN Approach with Granger-Causal Explanations
ArXiv ID: 2507.01980 “View on arXiv”
Authors: Linh Nguyen, Marcel Boersma, Erman Acar
Abstract
Fraudulent activity in the financial industry costs billions annually. Detecting fraud, therefore, is an essential yet technically challenging task that requires carefully analyzing large volumes of data. While machine learning (ML) approaches seem like a viable solution, applying them successfully is not so easy due to two main challenges: (1) the sparsely labeled data, which makes the training of such approaches challenging (with inherent labeling costs), and (2) lack of explainability for the flagged items posed by the opacity of ML models, that is often required by business regulations. This article proposes SAGE-FIN, a semi-supervised graph neural network (GNN) based approach with Granger causal explanations for Financial Interaction Networks. SAGE-FIN learns to flag fraudulent items based on weakly labeled (or unlabelled) data points. To adhere to regulatory requirements, the flagged items are explained by highlighting related items in the network using Granger causality. We empirically validate the favorable performance of SAGE-FIN on a real-world dataset, Bipartite Edge-And-Node Attributed financial network (Elliptic++), with Granger-causal explanations for the identified fraudulent items without any prior assumption on the network structure.
Keywords: fraud detection, graph neural networks, semi-supervised learning, Granger causality, financial networks
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced GNN architectures with formal message-passing equations and Granger-causality for explainability, indicating high math complexity. It is backed by empirical validation on a real-world financial dataset (Elliptic++) and addresses implementation-heavy challenges like semi-supervised learning on bipartite graphs.
flowchart TD
A["Research Goal<br>Detecting Fraud in Financial Networks<br>with Low Labels & High Explainability"] --> B["Data Input<br>Elliptic++ Dataset<br>Bipartite Financial Network"]
B --> C["Methodology<br>Semi-Supervised GNN"]
subgraph C ["SAGE-FIN Model"]
direction LR
C1["Graph Structure Learning"] --> C2["Feature Extraction"]
C2 --> C3["Label Propagation"]
end
C --> D["Computational Process<br>Granger Causality Analysis"]
D --> E["Explainable Output<br>Causal Links to Fraudulent Nodes"]
E --> F{"Outcomes"}
F --> G["High Fraud Detection Accuracy"]
F --> H["Regulatory Compliance<br>via Causal Explanations"]