A Unified AI System For Data Quality Control and DataOps Management in Regulated Environments
ArXiv ID: 2512.05559 “View on arXiv”
Authors: Devender Saini, Bhavika Jain, Nitish Ujjwal, Philip Sommer, Dan Romuald Mbanga, Dhagash Mehta
Abstract
In regulated domains such as finance, the integrity and governance of data pipelines are critical - yet existing systems treat data quality control (QC) as an isolated preprocessing step rather than a first-class system component. We present a unified AI-driven Data QC and DataOps Management framework that embeds rule-based, statistical, and AI-based QC methods into a continuous, governed layer spanning ingestion, model pipelines, and downstream applications. Our architecture integrates open-source tools with custom modules for profiling, audit logging, breach handling, configuration-driven policies, and dynamic remediation. We demonstrate deployment in a production-grade financial setup: handling streaming and tabular data across multiple asset classes and transaction streams, with configurable thresholds, cloud-native storage interfaces, and automated alerts. We show empirical gains in anomaly detection recall, reduction of manual remediation effort, and improved auditability and traceability in high-throughput data workflows. By treating QC as a system concern rather than an afterthought, our framework provides a foundation for trustworthy, scalable, and compliant AI pipelines in regulated environments.
Keywords: Data Quality Control (QC), DataOps, AI Governance, Automated Remediation, Data Pipeline Integrity, Multi-Asset
Complexity vs Empirical Score
- Math Complexity: 1.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The paper focuses on system architecture and operational implementation with minimal advanced mathematics, emphasizing practical deployment, data handling, and empirical performance gains in a production environment.
flowchart TD
Start[""Research Goal:
Unified AI Framework for Data QC & DataOps in Regulated Environments""] --> Input[""Data Inputs:
Streaming & Tabular Data
(Multi-Asset)""]
Input --> Method[""Methodology:
AI-Driven QC Layer
(Rule-Based + Statistical + AI Methods)""]
Method --> Proc[""Computational Processes:
1. Profiling & Audit Logging
2. Breach Handling & Dynamic Remediation
3. Configurable Policy Enforcement""]
Proc --> Outcome[""Key Outcomes:
• Higher Anomaly Detection Recall
• Reduced Manual Remediation
• Improved Auditability & Traceability""]