Evaluating utility in synthetic banking microdata applications
ArXiv ID: 2410.22519 “View on arXiv”
Authors: Unknown
Abstract
Financial regulators such as central banks collect vast amounts of data, but access to the resulting fine-grained banking microdata is severely restricted by banking secrecy laws. Recent developments have resulted in mechanisms that generate faithful synthetic data, but current evaluation frameworks lack a focus on the specific challenges of banking institutions and microdata. We develop a framework that considers the utility and privacy requirements of regulators, and apply this to financial usage indices, term deposit yield curves, and credit card transition matrices. Using the Central Bank of Paraguay’s data, we provide the first implementation of synthetic banking microdata using a central bank’s collected information, with the resulting synthetic datasets for all three domain applications being publicly available and featuring information not yet released in statistical disclosure. We find that applications less susceptible to post-processing information loss, which are based on frequency tables, are particularly suited for this approach, and that marginal-based inference mechanisms to outperform generative adversarial network models for these applications. Our results demonstrate that synthetic data generation is a promising privacy-enhancing technology for financial regulators seeking to complement their statistical disclosure, while highlighting the crucial role of evaluating such endeavors in terms of utility and privacy requirements.
Keywords: synthetic data generation, privacy-enhancing technologies, generative adversarial networks, statistical disclosure, Banking & Regulation
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The paper applies established synthetic data generation methods (marginal-based inference vs. GANs) to real-world banking microdata from the Central Bank of Paraguay, featuring public dataset release and specific domain applications like yield curves and transition matrices, indicating strong empirical implementation, while the math remains accessible without heavy theoretical derivations.
flowchart TD
A["Research Goal<br>Assess utility of synthetic banking microdata<br>for regulatory applications"] --> B["Methodology Framework"]
B --> C["Data Input<br>Central Bank of Paraguay<br>Banking Microdata"]
C --> D["Computational Processes<br>Marginal-based Inference vs GAN Models<br>Utility & Privacy Evaluation"]
D --> E["Application Scenarios<br>Financial Usage Indices<br>Term Deposit Yield Curves<br>Credit Card Transition Matrices"]
E --> F["Key Findings/Outcomes"]
F --> G["Frequency-based applications<br>show highest utility"]
F --> H["Marginal inference<br>outperforms GAN models"]
F --> I["PETs complement<br>statistical disclosure<br>with public datasets available"]