Financial Data Analysis with Robust Federated Logistic Regression

ArXiv ID: 2504.20250 “View on arXiv”

Authors: Kun Yang, Nikhil Krishnan, Sanjeev R. Kulkarni

Abstract

In this study, we focus on the analysis of financial data in a federated setting, wherein data is distributed across multiple clients or locations, and the raw data never leaves the local devices. Our primary focus is not only on the development of efficient learning frameworks (for protecting user data privacy) in the field of federated learning but also on the importance of designing models that are easier to interpret. In addition, we care about the robustness of the framework to outliers. To achieve these goals, we propose a robust federated logistic regression-based framework that strives to strike a balance between these goals. To verify the feasibility of our proposed framework, we carefully evaluate its performance not only on independently identically distributed (IID) data but also on non-IID data, especially in scenarios involving outliers. Extensive numerical results collected from multiple public datasets demonstrate that our proposed method can achieve comparable performance to those of classical centralized algorithms, such as Logistical Regression, Decision Tree, and K-Nearest Neighbors, in both binary and multi-class classification tasks.

Keywords: federated learning, logistic regression, privacy preservation, non-IID data, outlier robustness, General/Methodology (Financial Classification)

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Street Traders
  • Why: The paper applies standard logistic regression with robust aggregation techniques (trimmed mean, median) in a federated learning framework, involving moderate statistical modeling but no advanced mathematics; however, it provides rigorous empirical validation on multiple public datasets with metrics like AUC across IID and non-IID scenarios, including outlier robustness, making it practically grounded for financial applications.
  flowchart TD
    A["Research Goal:<br>Federated Financial Analysis<br>with Privacy, Interpretability, & Robustness"] --> B["Methodology:<br>Robust Federated Logistic Regression"]
    B --> C{"Data Types Evaluated"}
    C --> D["Independently<br>Identically Distributed Data"]
    C --> E["Non-IID Data<br>with Outliers"]
    D & E --> F["Computation:<br>Decentralized Model Training<br>(Local Updates + Global Aggregation)"]
    F --> G["Key Findings:<br>Comparable to Centralized Algorithms<br>e.g., LR, Decision Trees, KNN"]
    G --> H["Outcomes:<br>Validated Framework for<br>Binary & Multi-class Classification"]