A Unified AI System For Data Quality Control and DataOps Management in Regulated Environments

ArXiv ID: 2512.05559 “View on arXiv”

Authors: Devender Saini, Bhavika Jain, Nitish Ujjwal, Philip Sommer, Dan Romuald Mbanga, Dhagash Mehta

Abstract

In regulated domains such as finance, the integrity and governance of data pipelines are critical - yet existing systems treat data quality control (QC) as an isolated preprocessing step rather than a first-class system component. We present a unified AI-driven Data QC and DataOps Management framework that embeds rule-based, statistical, and AI-based QC methods into a continuous, governed layer spanning ingestion, model pipelines, and downstream applications. Our architecture integrates open-source tools with custom modules for profiling, audit logging, breach handling, configuration-driven policies, and dynamic remediation. We demonstrate deployment in a production-grade financial setup: handling streaming and tabular data across multiple asset classes and transaction streams, with configurable thresholds, cloud-native storage interfaces, and automated alerts. We show empirical gains in anomaly detection recall, reduction of manual remediation effort, and improved auditability and traceability in high-throughput data workflows. By treating QC as a system concern rather than an afterthought, our framework provides a foundation for trustworthy, scalable, and compliant AI pipelines in regulated environments.

Keywords: Data Quality Control (QC), DataOps, AI Governance, Automated Remediation, Data Pipeline Integrity, Multi-Asset

Complexity vs Empirical Score

Math Complexity: 1.0/10
Empirical Rigor: 8.5/10
Quadrant: Street Traders
Why: The paper focuses on system architecture and operational implementation with minimal advanced mathematics, emphasizing practical deployment, data handling, and empirical performance gains in a production environment.

  flowchart TD
    Start[""Research Goal:
    Unified AI Framework for Data QC & DataOps in Regulated Environments""] --> Input[""Data Inputs:
    Streaming & Tabular Data
    (Multi-Asset)""]
    
    Input --> Method[""Methodology:
    AI-Driven QC Layer
    (Rule-Based + Statistical + AI Methods)""]
    
    Method --> Proc[""Computational Processes:
    1. Profiling & Audit Logging
    2. Breach Handling & Dynamic Remediation
    3. Configurable Policy Enforcement""]
    
    Proc --> Outcome[""Key Outcomes:
    • Higher Anomaly Detection Recall
    • Reduced Manual Remediation
    • Improved Auditability & Traceability""]

A Unified AI System For Data Quality Control and DataOps Management in Regulated Environments#

Abstract#

Complexity vs Empirical Score#

A Unified AI System For Data Quality Control and DataOps Management in Regulated Environments

Abstract

Complexity vs Empirical Score